-
-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize PostView
queries by doing limit and order *before* other joins
#2994
Comments
There maybe another way to resolve this. We could estimate before hand what the top 50 rows should be based on stats of a sample. Then do the joins on the 50 to limit them to 40 (50 to give some leeway). |
I tried and failed to do the above solution, although it is technically possible in diesel using To outline some of the difficulties:
I'd prefer to hold off on adding the |
Hey. I want to mention this: One large issue with the post hot rank queries when filtering by community_follower (so the user-specific page) is that the post_aggregate.hot_rank column is in a different table than the post.community_id column. This is relevant because it means that postgresql cannot correlate these things instantly - but what you're looking for is the hottest post for the communities a user follows! As an example, I've taken the most expensive (average time 350ms) posts query from lemmy.world. Then I did this: alter table post_aggregates add column community_id integer;
update post_aggregates set community_id=post.community_id from post where post.id = post_aggregates.post_id; To copy over the community_id to post_aggregates. Then added this index:
And changed it to join community_follower with post_aggregates.community_id instead of post.community_id. This allows PG to get the hottest posts for each community_id. Doing this, the query goes from 350ms to 7 ms! Here's the EXPLAIN ANALYZE before the change: https://explain.depesz.com/s/YSn6 And here it is after the change: https://explain.depesz.com/s/PEOH Ok funnily enough, it didn't actually use the index. But still, the query is much faster because it can remove rows more quickly (doesn't have to look up the community_id separately 100k times). And that's even though it had a bad estimate on the rows found by idx_post_aggregates_featured_local_hot. Edit: found the reason why it wasn't using the index: because it's wrong, the query filters by community_follower.community and sorts by featured_local, hot_rank. Adding an index on (community_id, featured_local desc, hot_rank desc)` fixes it to make the query work as I'd've expected: |
I can confirm what @phiresky mentioned above - joining |
From here
Context #2877
cc @Nutomic @johanndt
The text was updated successfully, but these errors were encountered: