Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lease: randomize expiry on initial refresh call #8101

Merged
merged 2 commits into from
Jun 15, 2017

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Jun 14, 2017

Address #8096.

@fanminshi
Copy link
Member

I wonder with rate limit revoking in runLoop added, is TLL randomization still necessary?

@gyuho gyuho added the WIP label Jun 14, 2017
lease/lessor.go Outdated
@@ -475,6 +476,9 @@ func (le *lessor) initAndRecover() {
if lpb.TTL < le.minLeaseTTL {
lpb.TTL = le.minLeaseTTL
}
// randomize +/-10%, so that recovered leases with same TTL
// do not expire all at the same time
lpb.TTL += rand.Int63n(lpb.TTL/10) * int64((-(i & 1))|1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lpb.TTL *= 1.0+(0.1*(rand.Float64()-0.5))? I think Int63n will panic if TTL < 10 since the docs say it expects >= 1.

@heyitsanthony
Copy link
Contributor

@fanminshi without randomization the leases are forced to wait to drain at 1000 leases/s so if there's 10k leases, some leases will need to wait 10s to expire. If there's TTL randomization the queuing effect will be less pronounced: suppse those 10k leases TTLs are distributed over 10 seconds-- on average the leases will expire immediately since there's 1k leases expiring every second instead of 10k leases expiring in one instant.

@fanminshi
Copy link
Member

@heyitsanthony make sense.

@gyuho gyuho removed the WIP label Jun 14, 2017
@gyuho gyuho changed the title lease: randomize TTL on recovery lease: randomize expiry on initial refresh call Jun 14, 2017
@gyuho gyuho added the WIP label Jun 14, 2017
@gyuho gyuho force-pushed the randomize-renew branch 2 times, most recently from e6d81d9 to d6c5035 Compare June 14, 2017 23:21
@gyuho gyuho removed the WIP label Jun 14, 2017
@gyuho
Copy link
Contributor Author

gyuho commented Jun 14, 2017

@heyitsanthony PTAL. Thanks.

@gyuho gyuho added the WIP label Jun 15, 2017
@gyuho gyuho force-pushed the randomize-renew branch 2 times, most recently from cae0e47 to 58f4999 Compare June 15, 2017 01:14
@gyuho gyuho removed the WIP label Jun 15, 2017
lease/lessor.go Outdated
@@ -491,6 +492,7 @@ func (le *lessor) initAndRecover() {
itemSet: make(map[LeaseItem]struct{}),
expiry: forever,
revokec: make(chan struct{}),
fresh: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably this can be a flag on lessor itself instead? we change the extend accordingly. i do not feel we should put the calculation logic into lease.

lease/lessor.go Outdated
@@ -147,6 +148,9 @@ type lessor struct {
stopC chan struct{}
// doneC is a channel whose closure indicates that the lessor is stopped.
doneC chan struct{}

// 'true' when lease is just recovered
fresh bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promoting? // 'true' when the lessor is promoting

lease/lessor.go Outdated
l.refresh(extend)
ext := extend
if fresh {
// randomize expiry with 士10%, otherwise leases of same TTL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to a func?

lease/lessor.go Outdated
expiredC: make(chan []*Lease, 16),
stopC: make(chan struct{}),
doneC: make(chan struct{}),
promoting: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be set to false initially? when promote is called, we should change promoting to true. after finishing promotion, we set it back to false again. right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promoting field is only used by Promote method. If we set it true and then false inside Promote, there is no way for Promote to know if it has promoted before? Or if we want to randomize for every Promote call, we can just remove the field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when there is a new leader, the new leader will call promote to refresh all its leases, right? for each refresh caused by election, we need the randomization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we only call Promote after leader election https://github.com/coreos/etcd/blob/master/etcdserver/server.go#L1307-L1319. So should be safe to randomize for every Promote call, and no need promoting field?

@xiang90
Copy link
Contributor

xiang90 commented Jun 15, 2017

lgtm. defer to @heyitsanthony

Randomize the very first expiry on lease recovery
to prevent recovered leases from expiring all at
the same time.

Address etcd-io#8096.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Lease with TTL 5 should be renewed with randomization,
thus it's still possible to exist after 3 seconds.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
@gyuho gyuho merged commit b9a53db into etcd-io:master Jun 15, 2017
@gyuho gyuho deleted the randomize-renew branch June 15, 2017 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants