Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add back fast path for non-gappy syncs #17064

Merged
merged 3 commits into from
Apr 8, 2024
Merged

Conversation

erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Apr 8, 2024

PR #16942 removed an invalid optimisation that avoided pulling out state for non-gappy syncs. This causes a large increase in DB usage. c.f. #16941 for why that optimisation was wrong.

However, we can still optimise in the simple case where the events in the timeline are a linear chain without any branching/merging of the DAG.

cc. @richvdh

PR #16942 removed an invalid optimisation that avoided pulling out state
for non-gappy syncs. This causes a large increase in DB usage. c.f. #16941
for why that optimisation was wrong.

However, we can still optimise in the simple case where the events in
the timeline are a linear chain without any branching/merging of the
DAG.
@erikjohnston erikjohnston marked this pull request as ready for review April 8, 2024 10:53
@erikjohnston erikjohnston requested a review from a team as a code owner April 8, 2024 10:53
Comment on lines 1269 to 1270
is_linear_timeline = all(len(e.prev_event_ids()) <= 1 for e in batch.events)
if is_linear_timeline and not batch.limited:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you considered how this behaves in longer-lived forks:

             E1
           ↗    ↖
          |      S2
          |      ↑
          E3     |
          ↑      |
        --|------|----   <- prev sync
          |      |
          E4     E5       
          ↑      |
        --|------|----   <- this sync
          |      |
           ↖    /
             E6                (the distant future)

E4 and E5 both have single prev events, but I'm not convinced it is safe to drop the state delta between the forks here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state returned is currently (timeline_start | timeline_end) - previous_timeline_end - timeline_contains. If we have a linear chain with no gaps, my assumption is that timeline_start == previous_timeline_end and timeline_end == timeline_start + timeline_contains, which then all cancels out.

But bleurghghghghghg you're right that we need to actually check that all the events are in the same chain and point to the previous timeline end. BLEURGH.

(Though does the current code sensibly work in this case?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Though does the current code sensibly work in this case?)

WHO KNOWS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More constructively: it would be good to add a test for this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about this over lunch:

If we have a linear chain with no gaps, my assumption is that timeline_start == previous_timeline_end

I think with long lived forks this may or may not be true, depending on if the the end and start are part of the same fork. In the present code, we'll come to the wrong answer if that is the case, and in the new code it will come to the wrong answer either way.

Given that, I'm somewhat tempted to take accept that and fix the performance regression. The other option is to do an extra DB hit to check if the event ID corresponding to previous_timeline_end matches the prev event of the start of the timeline.

I think if we want to remove all these edge cases we'll need to change all this to try and use the current state (based on the extremities at the time), though that's quite a big change (but perhaps we can do it for sliding sync?)

Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to optimise for the linear case.

@erikjohnston erikjohnston merged commit 1f8f991 into develop Apr 8, 2024
38 checks passed
@erikjohnston erikjohnston deleted the erikj/optimisation_sync branch April 8, 2024 13:25
erikjohnston added a commit that referenced this pull request Apr 8, 2024
@erikjohnston erikjohnston mentioned this pull request Apr 8, 2024
erikjohnston added a commit that referenced this pull request Apr 8, 2024
Forget a line, and an empty batch is trivially linear.

c.f. #17064
hughns pushed a commit to hughns/synapse that referenced this pull request Apr 9, 2024
PR element-hq#16942 removed an invalid optimisation that avoided pulling out state
for non-gappy syncs. This causes a large increase in DB usage. c.f.
element-hq#16941 for why that optimisation was wrong.

However, we can still optimise in the simple case where the events in
the timeline are a linear chain without any branching/merging of the
DAG.

cc. @richvdh
hughns pushed a commit to hughns/synapse that referenced this pull request Apr 9, 2024
Forget a line, and an empty batch is trivially linear.

c.f. element-hq#17064
hughns pushed a commit to hughns/synapse that referenced this pull request Apr 9, 2024
yingziwu added a commit to yingziwu/synapse that referenced this pull request Apr 19, 2024
No significant changes since 1.105.0rc1.

- Stabilize support for [MSC4010](matrix-org/matrix-spec-proposals#4010) which clarifies the interaction of push rules and account data. Contributed by @clokep. ([\#17022](element-hq/synapse#17022))
- Stabilize support for [MSC3981](matrix-org/matrix-spec-proposals#3981): `/relations` recursion. Contributed by @clokep. ([\#17023](element-hq/synapse#17023))
- Add support for moving `/pushrules` off of main process. ([\#17037](element-hq/synapse#17037), [\#17038](element-hq/synapse#17038))

- Fix various long-standing bugs which could cause incorrect state to be returned from `/sync` in certain situations. ([\#16930](element-hq/synapse#16930), [\#16932](element-hq/synapse#16932), [\#16942](element-hq/synapse#16942), [\#17064](element-hq/synapse#17064), [\#17065](element-hq/synapse#17065), [\#17066](element-hq/synapse#17066))
- Fix server notice rooms not always being created as unencrypted rooms, even when `encryption_enabled_by_default_for_room_type` is in use (server notices are always unencrypted). ([\#17033](element-hq/synapse#17033))
- Fix the `.m.rule.encrypted_room_one_to_one` and `.m.rule.room_one_to_one` default underride push rules being in the wrong order. Contributed by @Sumpy1. ([\#17043](element-hq/synapse#17043))

- Refactor auth chain fetching to reduce duplication. ([\#17044](element-hq/synapse#17044))
- Improve database performance by adding a missing index to `access_tokens.refresh_token_id`. ([\#17045](element-hq/synapse#17045), [\#17054](element-hq/synapse#17054))
- Improve database performance by reducing number of receipts fetched when sending push notifications. ([\#17049](element-hq/synapse#17049))

* Bump packaging from 23.2 to 24.0. ([\#17027](element-hq/synapse#17027))
* Bump regex from 1.10.3 to 1.10.4. ([\#17028](element-hq/synapse#17028))
* Bump ruff from 0.3.2 to 0.3.5. ([\#17060](element-hq/synapse#17060))
* Bump serde_json from 1.0.114 to 1.0.115. ([\#17041](element-hq/synapse#17041))
* Bump types-pillow from 10.2.0.20240125 to 10.2.0.20240406. ([\#17061](element-hq/synapse#17061))
* Bump types-requests from 2.31.0.20240125 to 2.31.0.20240406. ([\#17063](element-hq/synapse#17063))
* Bump typing-extensions from 4.9.0 to 4.11.0. ([\#17062](element-hq/synapse#17062))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants