Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

/state_ids is slow to respond #13620

Open
Tracked by #15182
MadLittleMods opened this issue Aug 24, 2022 · 1 comment
Open
Tracked by #15182

/state_ids is slow to respond #13620

MadLittleMods opened this issue Aug 24, 2022 · 1 comment
Labels
A-Federation A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) A-Performance Performance, both client-facing and admin-facing O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.

Comments

@MadLittleMods
Copy link
Contributor

MadLittleMods commented Aug 24, 2022

/state_ids is slow to respond. The 99th percentile is almost solidly above 10s (we can't tell how high), and 5s for the 75th percentile 🐢. Seems like this should be much faster since we just need to dump out what we have in the database (simplification).

Being slow at /state_ids means slowing down /messages whenever we backfill and need to figure out the state at an event.

https://grafana.matrix.org/d/dYoRgTgVz/messages-timing?orgId=1&from=1661359694721&to=1661381294721&viewPanel=197
FederationStateIdsServlet response time, 75th percentile is 5.32s


As an example for #matrix:matrix.org, /state_ids took 30s for this endpoint to respond. We don't know what t2bot.io is doing but chances are we can improve things for the whole federation:

Because matrix.org is also slow to respond and we can actually trace it:

(also, what's that mystery gap at the end after we encode the JSON response (encode_json_response)?)

Related to:

Rate limiter

We could gain an easy ~1s, by tuning the rate limiter to not sleep as much if our servers can handle it.

https://grafana.matrix.org/d/dYoRgTgVz/messages-timing?orgId=1&from=1661358785681&to=1661380385682&viewPanel=204
Rate limit stats in grafana, 90th percentile is at 669ms

Using less /state_ids

Might be better if we could work with less state events every time /state_ids is called so it might be nice to have a way to get the delta between two events (the unknown event and the event where your server already has state), see #13618

@MadLittleMods MadLittleMods added A-Federation A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) labels Aug 24, 2022
@MadLittleMods MadLittleMods changed the title /state_ids is slow /state_ids is slow to respond Aug 24, 2022
@DMRobertson DMRobertson added S-Major Major functionality / product severely impaired, no satisfactory workaround. O-Uncommon Most users are unlikely to come across this or unexpected workflow T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. A-Performance Performance, both client-facing and admin-facing and removed A-Performance labels Aug 25, 2022
@clokep
Copy link
Member

clokep commented Nov 22, 2022

See #7893 for previous work on this. (That also links out to some other bits.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federation A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) A-Performance Performance, both client-facing and admin-facing O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.
Projects
None yet
Development

No branches or pull requests

3 participants