Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make copy/graft/prune work with unevenly distributed rows #5807

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

lutter
Copy link
Collaborator

@lutter lutter commented Feb 8, 2025

When we copy/graft/prune, we split the entire work that needs to be done into batches that are meant to take roughly three minutes to avoid bloating the subgraph_deployment table. Pruning causes a very serious problem with that, and when that happens it can be crippling for the performance of the overall system.

The code that adjusts the size of the batch to hit that target tacitly assumes that the actual work is distributed linearly, i.e., if we ask for work covering 10,000 rows (going by vid), we are fine with getting fewer rows, maybe even just a handful, but this needs to be uniform: any 10,000 row batch needs to have roughly the same number of rows. Pruning breaks this assumption since in a pruned subgraph, the beginning of the subgraph (as determined by block numbers) will be much sparser than the later parts. In one case, this misled the estimation logic to eventually try and copy 160M rows since that's what the early part of the subgraph indicated could be copied in the three minutes, as the subgraph was pruned and the range of 160M row numbers only contained 128 rows in the beginning of the subgraph. After that, the subgraph was dense and copying 160M vid's would take many hours.

This PR removes the assumption that the relation between vid and actual rows is linear. It uses the histogram_bounds from pg_stats to build a piecewise linear function, and estimates the number of rows in a given vid range using that piecewise linear function (the Ogive in the code) Now, when we ask for a batch of 10,000 rows, the code will adapt to an uneven vid distribution and return different size vid ranges for different parts of the table.

@lutter lutter self-assigned this Feb 8, 2025
@mangas mangas self-requested a review February 8, 2025 10:13
///
/// The word 'ogive' is somewhat obscure, but has a lot fewer letters than
/// 'piecewise linear function'. Copolit also claims that it is also a lot
/// more fun to say.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL

@lutter lutter force-pushed the lutter/nonlinear-batch branch from ff607b0 to b19b6c1 Compare February 11, 2025 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants