-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize trade aggregations query #632
Comments
@bartekn Since the description is empty here, I wonder if we log the actual slow queries. I want to see the explain analyze of it but getting the |
Yes, we do log slow queries. Here's an example:
|
@bartekn it is actually not terribly bad as I don't see any |
One thing that caught my eye here is that the time period is very large - basically aggregating data since the first ledger and index scan using this value is taking most of the execution time. Given that I will send slow query log from the last few days to you to check if the query is slow only for larger periods. |
Trade aggregation can take a lot of time because it essentially scans every trade in the given time range. When we originally built it we limited the bucket resolutions to a few values, so that in the future we can optimize by caching buckets in these resolutions (maybe by utilizing materialized views). Maybe the future is now? If we disallow long time ranges it's going to break pretty much every trading client out there. Maybe we can introduce this limit as a configuration value and start warning users that in the future that value in the public instance will be set to a month. |
Do we still limit the bucket resolutions to a few values or we relaxed the constraint some point in the past? Caching the buckets might be challenging as we would have to update the cache constantly depending on the resolutions if I understand it correctly. |
@bartekn after talking to @tomerweller, we think one possible approach is to have a database table or a materialized view store the trade aggregations with a resolution (bucket) equal to an hour from the beginning since the offset is in hours. For requests with a resolution that's greater or equal to an hour, we will calculate the result from this new table without a limit for the period. For requests with a resolution that's less the an hour, we will calculate the result from the original table but limit the period to within a day. What do you think about this approach? |
Overall it seems good but:
In the context of rate limiting I'm wondering if allowing full scans on a huge table like |
Agreed. Let's get some numbers.
Definitely might. In this case we know exactly the rate of data growth (per pair), so we can test it.
In theory you should have exactly one update per hour including as many rows as pairs that have been traded. |
With the hourly bucket optimization, we don't have to worry about limiting the period to be under a month on the client side anymore, right? |
Yeah, I was thinking about a solution for API clients in case we don't implement bucket optimization and just limit requests to 1 month period. |
It sounds like this is a fair bit of work, so I'd say it's lower priority. |
I looked at this a bit more, see my findings and proposed solution below. According to the horizon logs over a period of 7 days, there are over 2.3M requests to the An example of the delayed request is shown below; this request took
this translates into the sql query SELECT timestamp, count(*) as count, sum(base_amount) as base_volume, sum(counter_amount) as counter_volume, sum(counter_amount)/sum(base_amount) as avg, max_price(price) as high, min_price(price) as low, first(price) as open, last(price) as close FROM (SELECT div((cast((extract(epoch from ledger_closed_at) * 1000 ) as bigint) - 0), 86400000)*86400000 + 0 as timestamp, history_operation_id, "order", base_asset_id, base_amount, counter_asset_id, counter_amount, ARRAY[price_n, price_d] as price FROM history_trades WHERE base_asset_id = '2' AND counter_asset_id = '5833' AND ledger_closed_at >= '1970-01-01 00:00:00' ORDER BY history_operation_id , "order") AS htrd GROUP BY timestamp ORDER BY timestamp desc LIMIT 200
When analyzed, we get this QUERY PLAN -----------------------------------------------------------------------------------------------------------------
Limit (cost=2833991.45..2833991.95 rows=200 width=80) (actual time=44213.773..44213.893 rows=200 loops=1)
-> Sort (cost=2833991.45..2833991.95 rows=200 width=80) (actual time=44213.771..44213.816 rows=200 loops=1)
Sort Key: (((div(((((date_part('epoch'::text, history_trades.ledger_closed_at) * '1000'::double precision))::bigint - 0))::numeric, '86400000'::numeric) * '86400000'::numeric) + '0'::numeric)) DESC
Sort Method: top-N heapsort Memory: 78kB
-> HashAggregate (cost=2833979.30..2833983.80 rows=200 width=80) (actual time=44212.371..44213.110 rows=516 loops=1)
Group Key: (((div(((((date_part('epoch'::text, history_trades.ledger_closed_at) * '1000'::double precision))::bigint - 0))::numeric, '86400000'::numeric) * '86400000'::numeric) + '0'::numeric))
-> Sort (cost=596769.28..601960.03 rows=2076297 width=52) (actual time=6214.336..7484.817 rows=2687984 loops=1)
Sort Key: history_trades.history_operation_id, history_trades."order"
Sort Method: external sort Disk: 249592kB
-> Index Scan using htrd_pair_time_lookup on history_trades (cost=0.56..297799.79 rows=2076297 width=52) (actual time=0.061..3635.642 rows=2687984 loops=1)
Index Cond: ((base_asset_id = '2'::bigint) AND (counter_asset_id = '5833'::bigint) AND (ledger_closed_at >= '1970-01-01 00:00:00'::timestamp without time zone))
Planning time: 0.612 ms
Execution time: 44250.765 ms
(13 rows)
Precompute Trade Aggregation BucketsOne of the suggestions mentioned earlier was pre-computing the trade aggregation buckets. I think that is a good idea so I decided to test it out. I precomputed the buckets for the asset pair above using SELECT timestamp, sum(count) as count, sum(base_volume) as base_volume, sum(counter_volume) as counter_volume, sum(avg)/count(*) as avg, max_price(high) as high, min_price(low) as low, first(open) as open, last(close) as close FROM(SELECT div((cast(timestamp as bigint) - 0), 86400000) * 86400000 + 0 as timestamp, count, base_volume, counter_volume, avg, high, low, open, close FROM trdagg_view WHERE base_asset_id = '2' AND counter_asset_id = '5833' AND timestamp >= '1468800000') as htrd GROUP BY timestamp ORDER BY timestamp desc LIMIT 200; Analyzing this gives QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Limit (cost=1498.62..1710.12 rows=200 width=196) (actual time=18.271..75.610 rows=200 loops=1)
-> GroupAggregate (cost=1498.62..13543.55 rows=11390 width=196) (actual time=18.269..75.516 rows=200 loops=1)
Group Key: (((div((((trdagg_view."timestamp")::bigint - 0))::numeric, '86400000'::numeric) * '86400000'::numeric) + '0'::numeric))
-> Sort (cost=1498.62..1527.10 rows=11390 width=196) (actual time=17.968..19.284 rows=4432 loops=1)
Sort Key: (((div((((trdagg_view."timestamp")::bigint - 0))::numeric, '86400000'::numeric) * '86400000'::numeric) + '0'::numeric)) DESC
Sort Method: quicksort Memory: 3358kB
-> Seq Scan on trdagg_view (cost=0.00..731.19 rows=11390 width=196) (actual time=0.017..12.141 rows=11391 loops=1)
Filter: (("timestamp" >= '1468800000'::numeric) AND (base_asset_id = '2'::bigint) AND (counter_asset_id = '5833'::bigint))
Planning time: 0.131 ms
Execution time: 75.709 ms
(10 rows) ⚡ Alright, Precomputing the buckets definitely looks like the way to go, but there are some issues. Let us look at some numbers: Currently, there are 25,070 assets in the Thankfully(At the moment), not all of those asset pairs are in play. Looking at the Bounded time rangeAnother solution that I have considered is having the time-range bounded to multiples of the specified resolution. If we do this, the slow query above will be represented as SELECT timestamp, count(*) as count, sum(base_amount) as base_volume, sum(counter_amount) as counter_volume, sum(counter_amount)/sum(base_amount) as avg, max_price(price) as high, min_price(price) as low, first(price) as open, last(price) as close FROM (SELECT div((cast((extract(epoch from ledger_closed_at) * 1000 ) as bigint) - 0), 86400000)*86400000 + 0 as timestamp, history_operation_id, "order", base_asset_id, base_amount, counter_asset_id, counter_amount, ARRAY[price_n, price_d] as price FROM history_trades WHERE base_asset_id = '2' AND counter_asset_id = '5833' AND ledger_closed_at >= '2019-01-26 00:00:00' ORDER BY history_operation_id , "order") AS htrd GROUP BY timestamp ORDER BY timestamp desc LIMIT 200 When we analyze this, we get QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Limit (cost=1053711.98..1053712.48 rows=200 width=80) (actual time=2530.585..2530.708 rows=200 loops=1)
-> Sort (cost=1053711.98..1053712.48 rows=200 width=80) (actual time=2530.583..2530.632 rows=200 loops=1)
Sort Key: (((div(((((date_part('epoch'::text, history_trades.ledger_closed_at) * '1000'::double precision))::bigint - 0))::numeric, '86400000'::numeric) * '86400000'::numeric) + '0'::numeric)) DESC
Sort Method: quicksort Memory: 78kB
-> HashAggregate (cost=1053699.84..1053704.34 rows=200 width=80) (actual time=2530.109..2530.385 rows=201 loops=1)
Group Key: (((div(((((date_part('epoch'::text, history_trades.ledger_closed_at) * '1000'::double precision))::bigint - 0))::numeric, '86400000'::numeric) * '86400000'::numeric) + '0'::numeric))
-> Sort (cost=307102.24..308834.49 rows=692898 width=52) (actual time=369.907..435.424 rows=151092 loops=1)
Sort Key: history_trades.history_operation_id, history_trades."order"
Sort Method: external sort Disk: 14032kB
-> Index Scan using htrd_pair_time_lookup on history_trades (cost=0.56..226349.23 rows=692898 width=52) (actual time=0.038..229.375 rows=151092 loops=1)
Index Cond: ((base_asset_id = '2'::bigint) AND (counter_asset_id = '5833'::bigint) AND (ledger_closed_at >= '2019-01-26 00:00:00'::timestamp without time zone))
Planning time: 0.202 ms
Execution time: 2533.212 ms
(13 rows)
Proposed SolutionMy proposal for optimizing the trade_aggregation endpoint is a that we approach this in stages.
Will love to hear some feedback on this. My plan is to implement stage 0 in the next sprint. |
Wow! Great find with the time bounding approach. That nested query completely ignores paging right now and brings the entire time range worth of trades for every page. Who ever wrote that monstrosity... 🤷♂️ I agree about stage 0. Given the performance gains from that I think that precomputing buckets is pretty aggressive. Let's put that on a hold until evidence suggests otherwise. |
Yikes! It's kinda amazing that this query isn't doing this already. Totally agree we should do that ASAP and then re-measure. I'd expect the time bounding should handle insane cases like asking for 1m summaries of an asset from genesis to the present day. I'd like to see the performance of some large time bounded 1w resolution queries though. There may still be work to do there. |
Added bounded time range in #2856. |
…des aggregation query (#2856) This commit limits time range in trade aggregations query to a maximum number of buckets limited by `limit` param. In #632 (comment) it was suggested that the query in trade aggregations needs to prepare buckets for entire history if start and end dates are not provided. This can be fixed because we limit number of buckets returned to a user. The code in this commit adjusts start time and end time to limit the range to the maximum range visible by the API user.
I implemented Bounded Range solution in #2856 and I'm currently checking it in stg. There's a small improvement in p99 response time, much better in p95 but it's still above 1s for many queries. After #2869 which improves performance of I think we should spend more time next sprint and implement the buckets solution. It will be much simpler than a couple months ago:
To make sure users can upgrade without having to reingest their DBs we can add the new code behind a feature flag. Moving to Horizon 1.8.0 milestone temporarily. To be discussed in Horizon Team meeting. |
We had to revert the Bounded Range solution in: #2893. It doesn't work for markets with some buckets missing for a given time range and limit (more in the revert PR description). Again, we should probably implement precomputed buckets idea. This issue should not occur in this solution. |
Damn, that's disappointing. Since the issue is now not even partially resolved, lets prioritise this now then. Moved to top of backlog. |
Sidenote: Reading back through this, the explain line saying: Intuitively, I'd say we should do bounded ranges (more flexible, and lower disk use), and do multiple queries to fix the limit issues with #2893. We could possibly combine that with table-partitioning. This would need some experimentation and analysis on the table/index sizes and access patterns, but we could partition either on:
|
Per discussion in team meeting today, I think this will be easier if we change some of the trade aggregation endpoint semantics. Particularly around start_time, end_time, and offset. Proposed API Param changes:- start_time (optional) long
+ start_time (required) long
The lower time boundary represented as milliseconds since epoch.
- end_time (optional) long
+ end_time (required) long
The upper time boundary represented as milliseconds since epoch.
- resolution (optional) long
+ resolution (required) long
The segment duration represented as milliseconds. Supported values are 1 minute (60000), 5 minutes (300000), 15 minutes (900000), 1 hour (3600000), 1 day (86400000) and 1 week (604800000).
offset (optional) long
- Segments can be offset using this parameter. Expressed in milliseconds. Can only be used if the resolution is greater than 1 hour. Value must be in whole hours, less than the provided resolution, and less than 24 hours.
+ Segments can be offset using this parameter. Expressed in milliseconds. Value must be in multiples of 15 minutes, less than the provided resolution, and less than 24 hours. Used to adjust queries for a timezone offset.
order (optional)
A designation of the order in which records should appear. Options include asc(ascending) or desc (descending). If this argument isn’t set, it defaults to asc.
limit (optional)
The total number of records returned. The limit can range from 1 to 200 - an upper limit that is hardcoded in Horizon for performance reasons. If this argument isn’t designated, it defaults to 10.
... asset pair type/issuer/code ... The offset query example: Range/Resolution TradeoffsAs discussed, we should enforce range/resolution tradeoffs. The idea being, that for a given resolution, there's a maximum
In essence: if (end_time - start_time) > (resolution * 200) {
w.WriteStatus(http.StatusBadRequest)
return
} Given most of our requests are for 1min and 15min resolutions, this should mean most requests are constrained to smaller sections of the database, making partitioning/sharding more effective. |
This contract change breaks clients that give a duration that can contain multiple pages of results and depend on the paging links. Maybe it's worth taking a look at public horizon logs to see what percentage of requests this will break? |
It seems like we need some more information on the paging strategy of trading clients to make an informed decision. The suggestion above by @paulbellamy has the potential to introduce breakage in paging (or not). And, to the best of my understanding, #2856 by @bartekn was reverted because it effectively changed the paging contract (more on this below). This is a bit of a tragedy because trading aggregations don't really need opinionated server side paging. The combination of parameters gives clients all the freedom in the world to page as they please. However, when creating this endpoint we adhered to the same HAL response structure as the rest of the API. I'll add this to my upcoming book titled Revisiting #2893 that reverted #2856. It seems like the issue with timebounding is that the results are sparse (empty buckets not included) which breaks paging logic. Paging logic assumes that What if we restore #2856 and solve this problem by artificially un-sparsing the results adding 0-buckets where there are non buckets to the results? This should restore paging logic. This is very inefficient for rarely traded pairs, but maybe that's not the end of the world. We should also probably not serve 0-buckets from before the existence of a given orderbook. This is an implicit contract change as clients have never had to deal with empty buckets but it's fairly non offensive and we can check with the major trading clients if it breaks them. My bet is that it won't introduce breakage. |
I've had an idea how to reduce disk size of pre-computed buckets, which would allow us to make this more efficient without changing the API. Just seeing how feasible it is before committing to that, though. |
No description provided.
The text was updated successfully, but these errors were encountered: