-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Horizon: /account/:id/payments
timeout error for accounts with large number of trades
#1808
Comments
Interesting - I just tried it and it came back in under 5s. |
Checked again. Run 5 times – took about ~5s average. Could you please check Horizon server logs? To my mind, the problem is worth investigation. From my experience, any query that executes longer than 150-200ms is potentially a problem because such requests tend to provoke "avalanches" under heavy load scenarios – each running query reserves a portion of db user memory buffers for grouping and sorting operations, making the subsequent queries slower and slower as each new sql request may provoke cached results and indexes eviction from the shared mem pool depending on db settings. That's not usually the case when the server has 256+ GB of memory, but may become a cause of degraded performance with limited memory setups. |
Yeah we'll take a look. What was the timestamp when you tested? |
I submitted the issue at |
We've discovered that some other queries don't use proper indexes. We're currently verifying with the ops team when was the last run of |
OK, it looks like
My understanding is that Postgres runs a nested loop (to join tables): the outer loop finds all operations for a given account, then inner loop checks if the given operation is actually a payment. Now, when there are many non-payment ops Postgres needs to skip many of them until it finds a specified number ( The problem is that it's not possible to improve the query with existing schema. Data required for this query is in two different tables: information about operation type in It looks like the only reasonable, long-term solution is to:
Then all queries like: WHERE hopp.history_account_id = '5308433' AND hopp.type IN ('0','1','2','13','8') AND hopp.history_operation_id < '9223372036854775807' will use the index. I'll try this in the staging cluster this week. |
I tried the following steps: ALTER TABLE history_operation_participants ADD type integer DEFAULT NULL;
UPDATE history_operation_participants SET type = (SELECT type FROM history_operations where id = history_operation_participants.history_operation_id);
VACUUM VERBOSE history_operation_participants; # to reclaim space after UPDATE
CREATE UNIQUE INDEX hist_op_p_account_type_id ON history_operation_participants USING BTREE(history_account_id, type, history_operation_id); Total query time was around 10 hours on AWS RDS
This is a huge improvement however I was wondering if we can do better.
To sum it up, I think I'm going to test a new field and index like this: ALTER TABLE history_operation_participants ADD is_payment boolean DEFAULT NULL;
UPDATE history_operation_participants SET is_payment = (SELECT type IN ('0','1','2','13','8') FROM history_operations where id = history_operation_participants.history_operation_id);
VACUUM VERBOSE history_operation_participants; # to reclaim space after UPDATE
CREATE UNIQUE INDEX hist_op_p_account_payment_id ON history_operation_participants USING BTREE(history_account_id, is_payment, history_operation_id); After the new migration, we should be able to rewrite payment queries' |
This is an impressive improvement! It seems a little special-case, though. We might forget to update It seems like the query analyzer is just straight getting it wrong. If you ran separate queries, one for each of those 5 payment types, it seems like it'd be much faster than the 1.8 seconds that's happening above. The logic is that the cost of the The ideal would be to figure out a way to get the query optimizer to do this right, but another option is to execute those 5 queries and concatenate their results. Not a big deal if you're already well down the |
@bartekn After a second glance at the query, I spotted that you are using stringified values for Also, have you tried building an expression index to support this particular query instead of introducing new field? It might be better in terms of space saving. Something like CREATE INDEX idx_history_operations_is_payment
ON history_operations(type IN (0,1,2,8,13)); This won't provide the same performance as a new |
I think query planner is sophisticated but probably not as much to realize that it's faster to do it that way instead of using different index and then filtering rows. I like the idea of either running the query multiple times and then merging results in Horizon code or maybe using
Yes, postgres will convert values before running the query so it's fine. I used
This won't solve the problem because the lack of index on
The first fix I made was adding
It no longer needs to check the other table but still needs to filter out a lot of rows in What could help is adding |
Today, during Horizon Team meeting, we decided to prioritize fixing this. Please add any other ideas to this issue by Nov, 27th so we have time to implement this in 0.24.0 release. Otherwise, we'll add |
I checked other types of slow queries to make sure we do all necessary migrations in one go. I realized that the fix above is not enough because what we fix here is only one case where many rows must be scanned in a nested loop in order to check the query condition: finding successful payment operations for accounts with a large number of non-payment ops. In case of accounts with many non-payment ops that are part of failed txs I think this fix won't make them faster because a loop checking So ultimately, in order to support super fast "For Accounts" queries in history tables we need to add the following indexes:
Here's how the queries will use them after updating them to use proper columns:
Thoughts:
|
Thanks Bartek! Like you point out, it's pretty rough to have to add all those new indices. To me it seems like those two middle indices don't provide meaningful value beyond the top index |
Actually, how about partitioning on |
According to Postgres docs:
So using only
This is interesting but I haven't found information in the docs if it's possible to send queries that get data from two partitions automatically (instead of using ex.
|
Great points! Given this index:
I'm liking the Do you see any problems with that approach in the real Horizon queries? Obviously they're more complex and involve joins that may not mix well with As far as partitions, my understanding is that queries automatically get data from all partitions without needing I'm leaning away from partitions, though. They seem to add serious maintenance overhead and constraints. |
Looks like Figma run into similar problem: https://www.figma.com/blog/post-mortem-service-disruption-on-january-21-22-2020/ They say that upgrading to Postgres 11 solved the problems with the query planner for them. I wonder if this could solve this issue. I’m pretty confident that schema change is the only option but… maybe.
Let's try it out in staging server first before any code/schema changes. |
I attempted the problematic payments request on the staging db which was created recently and has been fully ingested. unfortunately, the request still times out on staging. So at this point we should check whether postgres 11 would improve the query. If that doesn't work then we can consider schema changes |
@tamirms could you retest the query? |
This requests fails with 504 error after ~20s loading timeout:
https://horizon.stellar.org/accounts/GAMU5TQFUMDGVKYQPPDCD2MKKUUWELSQAEKNNU4RFQCWFSRBPJA2MAGQ/payments?order=asc&limit=200
I suspect that the root of the problem lies in the huge number of trades (120K trades and only 3 payments). The same endpoint for other accounts shows decent performance (for example, this one).
The text was updated successfully, but these errors were encountered: