Make copy/graft/prune work with unevenly distributed rows #5807

lutter · 2025-02-08T02:02:04Z

When we copy/graft/prune, we split the entire work that needs to be done into batches that are meant to take roughly three minutes to avoid bloating the subgraph_deployment table. Pruning causes a very serious problem with that, and when that happens it can be crippling for the performance of the overall system.

The code that adjusts the size of the batch to hit that target tacitly assumes that the actual work is distributed linearly, i.e., if we ask for work covering 10,000 rows (going by vid), we are fine with getting fewer rows, maybe even just a handful, but this needs to be uniform: any 10,000 row batch needs to have roughly the same number of rows. Pruning breaks this assumption since in a pruned subgraph, the beginning of the subgraph (as determined by block numbers) will be much sparser than the later parts. In one case, this misled the estimation logic to eventually try and copy 160M rows since that's what the early part of the subgraph indicated could be copied in the three minutes, as the subgraph was pruned and the range of 160M row numbers only contained 128 rows in the beginning of the subgraph. After that, the subgraph was dense and copying 160M vid's would take many hours.

This PR removes the assumption that the relation between vid and actual rows is linear. It uses the histogram_bounds from pg_stats to build a piecewise linear function, and estimates the number of rows in a given vid range using that piecewise linear function (the Ogive in the code) Now, when we ask for a batch of 10,000 rows, the code will adapt to an uneven vid distribution and return different size vid ranges for different parts of the table.

mangas · 2025-02-10T15:49:05Z

graph/src/util/ogive.rs

+///
+/// The word 'ogive' is somewhat obscure, but has a lot fewer letters than
+/// 'piecewise linear function'. Copolit also claims that it is also a lot
+/// more fun to say.


lutter self-assigned this Feb 8, 2025

mangas self-requested a review February 8, 2025 10:13

mangas reviewed Feb 10, 2025

View reviewed changes

lutter mentioned this pull request Feb 11, 2025

store: Try to avoid pathological batch size adjustments #5792

Closed

lutter force-pushed the lutter/nonlinear-batch branch from 11c9b94 to ff607b0 Compare February 11, 2025 01:32

lutter added 9 commits February 10, 2025 17:48

store: Do not assume that copies start at vid == 0

00ea0c8

store: Start copies at the minimum vid, not just at 0

7babbe6

graph: Add utility for handling cumulative histograms

2309ee8

store: Move AdaptiveBatchSize to its own module

cc6c5ce

store: Move batching logic for copies into seperate struct

73893d8

store: Introduce a VidRange struct

3c6e482

store: Use VidRange for pruning

194a37a

store: Use VidBatcher to batch pruning queries

3567251

store: Remove unused ToSql/FromSql impls for AdaptiveBatchSize

b19b6c1

lutter force-pushed the lutter/nonlinear-batch branch from ff607b0 to b19b6c1 Compare February 11, 2025 01:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make copy/graft/prune work with unevenly distributed rows #5807

Make copy/graft/prune work with unevenly distributed rows #5807

lutter commented Feb 8, 2025

mangas Feb 10, 2025

Make copy/graft/prune work with unevenly distributed rows #5807

Are you sure you want to change the base?

Make copy/graft/prune work with unevenly distributed rows #5807

Conversation

lutter commented Feb 8, 2025

mangas Feb 10, 2025

Choose a reason for hiding this comment