You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After #9337, when shards restart and need to catch up on old WAL, each shard will pull WAL records from S3 and filter them. This results in O(catchup_ranges) work. We should do this work once across multiple shards, since we expect many shards to require catchup at roughly the same time.
TODO: details post-RFC.
Consider gossiping timeline progress between safekeepers to know how many shards are offline/lagging.
Consider memory budgeting. Simple approach: estimate timeline catchup volume from LSNs, acquire from semaphore, block when unavailable. Consider QoS to prioritize "important" tenants.
The text was updated successfully, but these errors were encountered:
After #9337, when shards restart and need to catch up on old WAL, each shard will pull WAL records from S3 and filter them. This results in O(catchup_ranges) work. We should do this work once across multiple shards, since we expect many shards to require catchup at roughly the same time.
TODO: details post-RFC.
Consider gossiping timeline progress between safekeepers to know how many shards are offline/lagging.
Consider memory budgeting. Simple approach: estimate timeline catchup volume from LSNs, acquire from semaphore, block when unavailable. Consider QoS to prioritize "important" tenants.
The text was updated successfully, but these errors were encountered: