-
Notifications
You must be signed in to change notification settings - Fork 15
Targeted sweep tombstones causes perf degradation #7198
Comments
It seems the last one is already covered on atlasdb/atlasdb-impl-shared/src/main/java/com/palantir/atlasdb/sweep/queue/SweepDelay.java Line 81 in bea402a
|
Generally clarifying this issue is meant to be a follow-up from PDS-554079 |
Yep, exactly.
Interesting. I think I see where this is coming from though I'd be a little concerned about potentially automatically extending Sweep delays by an hour - the SLAs probably can take it, but clients might not / if we have a few rolls in close-ish proximity, this could have bad consequences as Sweep might remain inactive for the entire period. We currently have an exponential backoff that goes up to 30m - I'm not opposed to relaxing the parameters on that for a bit, especially for early attempts, if we think that'll be useful. Maybe we can do some speculative thing where an InsufficientConsistencyException on a delete puts us in a suspected non-ALL state? And we try every 5m or so a CL ALL get (or some other non-destructive operation) on the same set of nodes when in a suspected non-ALL state, only resuming normal sweep when this succeeds. |
In practice I think we just press on with "sweep at quorum", and either as part of that work or in a follow-up we essentially try to poll the clusters health to understand if we're healthy enough for "sweep at ALL" to continue or not. rather than just attempting to do it with actual deletes being issued? Which I guess is vaguely the same/similar as what you describe. But yea I think it just comes down to:
|
InsufficientConsistencyException
.With targeted sweep, writes also generate a tombstone:
series1 + offset1
at commit timestampt1
will cause a ranged tombstone to be added toseries1 + offset1 + [< t1]
Solutions that come to mind are:
InsufficientConsistencyException
, retry only 1h from now?The text was updated successfully, but these errors were encountered: