Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Station Public Dashboard - optimise "Leaderboard - FIL Earned" #178

Open
Tracked by #213 ...
bajtos opened this issue Oct 9, 2024 · 2 comments
Open
Tracked by #213 ...

Station Public Dashboard - optimise "Leaderboard - FIL Earned" #178

bajtos opened this issue Oct 9, 2024 · 2 comments
Labels
good first issue Good for newcomers

Comments

@bajtos
Copy link
Contributor

bajtos commented Oct 9, 2024

The panel "Leaderboard - FIL Earned" calls https://stats.filspark.com/participants/top-earning?from=2024-10-02&to=today, which takes several seconds to load (3-5 seconds).

Let's optimise this query to bring the query duration under 500ms.

@bajtos
Copy link
Contributor Author

bajtos commented Oct 9, 2024

Under the hood, the API executes a non-trivial SQL query:

https://github.com/filecoin-station/spark-stats/blob/8a3ec03142d5268b96d2846cf2d81efea7c158bc/stats/lib/platform-stats-fetchers.js#L89-L103

   WITH latest_scheduled_rewards AS (
      SELECT DISTINCT ON (participant_address) participant_address, scheduled_rewards
      FROM daily_scheduled_rewards
      ORDER BY participant_address, day DESC
    )
    SELECT
      COALESCE(drt.to_address, lsr.participant_address) as participant_address,
      COALESCE(SUM(drt.amount), 0) + COALESCE(lsr.scheduled_rewards, 0) as total_rewards
    FROM daily_reward_transfers drt
    FULL OUTER JOIN latest_scheduled_rewards lsr
      ON drt.to_address = lsr.participant_address
    WHERE (drt.day >= $1 AND drt.day <= $2) OR drt.day IS NULL
    GROUP BY COALESCE(drt.to_address, lsr.participant_address), lsr.scheduled_rewards
    ORDER BY total_rewards DESC

Query plan printed by EXPLAIN ANALYZE:

 Sort  (cost=172922.90..172949.97 rows=10827 width=76) (actual time=3694.975..3702.066 rows=34645 loops=1)
   Sort Key: ((COALESCE(sum(drt.amount), '0'::numeric) + COALESCE(daily_scheduled_rewards.scheduled_rewards, '0'::numeric))) DESC
   Sort Method: external merge  Disk: 2576kB
   ->  GroupAggregate  (cost=171926.69..172197.37 rows=10827 width=76) (actual time=3629.611..3667.935 rows=34645 loops=1)
         Group Key: (COALESCE(drt.to_address, daily_scheduled_rewards.participant_address)), daily_scheduled_rewards.scheduled_rewards
         ->  Sort  (cost=171926.69..171953.76 rows=10827 width=57) (actual time=3629.575..3638.508 rows=35835 loops=1)
               Sort Key: (COALESCE(drt.to_address, daily_scheduled_rewards.participant_address)), daily_scheduled_rewards.scheduled_rewards
               Sort Method: external merge  Disk: 2320kB
               ->  Merge Full Join  (cost=165457.50..171201.16 rows=10827 width=57) (actual time=3030.362..3604.880 rows=35835 loops=1)
                     Merge Cond: (daily_scheduled_rewards.participant_address = drt.to_address)
                     Filter: (((drt.day >= '2024-08-01'::date) AND (drt.day <= '2024-10-01'::date)) OR (drt.day IS NULL))
                     Rows Removed by Filter: 4166
                     ->  Unique  (cost=165457.21..170455.69 rows=19596 width=59) (actual time=3030.299..3571.583 rows=35698 loops=1)
                           ->  Sort  (cost=165457.21..167956.45 rows=999696 width=59) (actual time=3030.296..3467.228 rows=999175 loops=1)
                                 Sort Key: daily_scheduled_rewards.participant_address, daily_scheduled_rewards.day DESC
                                 Sort Method: external merge  Disk: 68400kB
                                 ->  Seq Scan on daily_scheduled_rewards  (cost=0.00..21801.96 rows=999696 width=59) (actual time=11.569..163.304 rows=999175 loops=1)
                     ->  Index Scan using daily_reward_transfers_to_address_day on daily_reward_transfers drt  (cost=0.29..344.29 rows=8927 width=60) (actual time=0.031..12.213 rows=8953 loops=1)
 Planning Time: 0.215 ms
 JIT:
   Functions: 17
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 1.892 ms, Inlining 0.000 ms, Optimization 0.623 ms, Emission 10.624 ms, Total 13.139 ms
 Execution Time: 3724.605 ms

The following index can speed up the query from 3.7s to 2.1s:

CREATE INDEX daily_scheduled_rewards_participant_day ON daily_scheduled_rewards (participant_address, day DESC);

However, that's not enough.

I think we need to find a way how to optimise the first query that looks up the latest scheduled rewards for each participant, as that's taking most of the time.

SELECT DISTINCT ON (participant_address) participant_address, scheduled_rewards
FROM daily_scheduled_rewards
ORDER BY participant_address, day DESC

Further reading:

The following query seems to work faster (completes under ~700ms) and produce the same data (albeit unsorted, which saves ~400ms):

      WITH latest_scheduled_rewards_update AS (
        SELECT participant_address, MAX(day) AS day
        FROM daily_scheduled_rewards
        GROUP BY participant_address
      )
      SELECT participant_address, scheduled_rewards
      FROM latest_scheduled_rewards_update
      LEFT JOIN daily_scheduled_rewards USING (participant_address, day);

After adding another index:

CREATE INDEX daily_scheduled_rewards_participant ON daily_scheduled_rewards (participant_address);

The following query returns the data we need in about ~700ms.

WITH 
latest_scheduled_rewards_update AS (
  SELECT participant_address, MAX(day) AS day
  FROM daily_scheduled_rewards
  GROUP BY participant_address
),
latest_scheduled_rewards AS (
  SELECT participant_address, scheduled_rewards
  FROM latest_scheduled_rewards_update
  LEFT JOIN daily_scheduled_rewards USING (participant_address, day)
)
SELECT
  COALESCE(drt.to_address, lsr.participant_address) as participant_address,
  COALESCE(SUM(drt.amount), 0) + COALESCE(lsr.scheduled_rewards, 0) as total_rewards
FROM daily_reward_transfers drt
FULL OUTER JOIN latest_scheduled_rewards lsr
  ON drt.to_address = lsr.participant_address
WHERE (drt.day >= '2024-08-01' AND drt.day <= '2024-10-01') OR drt.day IS NULL
GROUP BY COALESCE(drt.to_address, lsr.participant_address), lsr.scheduled_rewards
ORDER BY total_rewards DESC;

I think this could be further improved by using participant_id FK instead of participant_address string because the join operation has to hash participant addresses now.

@bajtos bajtos mentioned this issue Oct 9, 2024
31 tasks
@juliangruber
Copy link
Member

Great work, Miro!!

@bajtos bajtos added the good first issue Good for newcomers label Dec 2, 2024
@bajtos bajtos mentioned this issue Dec 2, 2024
15 tasks
@bajtos bajtos mentioned this issue Jan 6, 2025
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants