Fix race condition in ExponentiallyDecayingReservoir's rescale method #1033

ho3rexqj · 2016-12-30T04:17:36Z

Fixes issues #1005 and #793 - note that in #793 the longPeriodsOfInactivityShouldNotCorruptSamplingState test did not cover multiple concurrent requests after a long idle period.

The race condition currently occurs when two threads update the reservoir concurrently - after the first thread has successfully updated nextScaleTime (line 162) but before the write lock is obtained (line 163) a second thread updates the reservoir (line 95+). With several threads pushing updates concurrently this presents a narrow window during which the second thread may obtain the lock before the first resulting in a corrupt reservoir state. Note, however, that if a snapshot is requested just prior to the two concurrent updates the read lock will already be active when the two threads enter rescaleIfNeeded, resulting in a much larger window for this race condition.

…ce condition.

…method.

arteam · 2017-01-05T08:20:40Z

Thank you very much for the fix and very detailed unit test!

ryantenney · 2017-01-05T13:04:24Z

I have an issue with this. The compareAndSet operation is atomic so it shouldn't allow more than one thread into that if block. Granted 'lockForRescale()' does effectively the same thing, however it is a more expensive operation than the compareAndSet.

arteam · 2017-01-05T13:56:31Z

The issue as I see it:

Thread 1 calls update
Thread 2 calls update
Thread 1 calls rescaleIfNeeded
Thread 2 calls rescaleIfNeeded
Thread 1 reads the CAS and updates it
Thread 1 gets swapped out
Thread 2 reads the updated CAS and exits
Thread 2 obtains a read lock and works with a new weight, but old (not scaled) values
Thread 1 gets swapped in and acquires a write lock, but it's too late

Basically, the CAS only guarantees that nextScaleTime gets updated atomically, but not other reservoir's state.

arteam · 2017-01-09T10:52:29Z

In case the fix the correct, I would like to cherry-pick this change to the 3.1.3-maintenance branch to provide users who affected by this bug an easy path to upgrade.

arteam · 2017-01-23T17:48:13Z

I didn't get more reasons opposing this fix, so I'm going to move this bugfix to the 3.1.3 branch. In the worst case, we have a performance regression, in the best we fix a concurrency bug.

ahadadi · 2017-05-01T16:33:05Z

@arteam, I think your comment:
"Thread 2 obtains a read lock and works with a new weight, but old (not scaled) values"
is incorrect.
The weight becomes new when startTime gets updated, but it gets updated only under the write lock by thread 1.
I still think this is a good change, as it simplifies reasoning about the code.

ho3rexqj added 2 commits December 25, 2016 00:01

Added ExponentiallyDecayingReservoir test illustrating the rescale ra…

d9af3c1

…ce condition.

Fixed the race condition in ExponentiallyDecayingReservoir's rescale …

b022f04

…method.

arteam merged commit 4abcd4d into dropwizard:3.2-development Jan 5, 2017

This was referenced Jan 5, 2017

NaN mean timer value and standard deviation #1005

Closed

ExponentiallyDecayingReservoir stops registering updates #793

Closed

arteam added the bug label Jan 5, 2017

arteam added this to the 3.1.3 milestone Jan 23, 2017

arteam mentioned this pull request Jan 23, 2017

Backport rescale bug #1046

Merged

arteam modified the milestones: 3.2.0, 3.1.3 Jan 23, 2017

ho3rexqj deleted the fix/exponentially_decaying_reservoir_rescale_race branch February 3, 2017 03:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race condition in ExponentiallyDecayingReservoir's rescale method #1033

Fix race condition in ExponentiallyDecayingReservoir's rescale method #1033

ho3rexqj commented Dec 30, 2016

arteam commented Jan 5, 2017

ryantenney commented Jan 5, 2017

arteam commented Jan 5, 2017 •

edited

Loading

arteam commented Jan 9, 2017

arteam commented Jan 23, 2017

ahadadi commented May 1, 2017

Fix race condition in ExponentiallyDecayingReservoir's rescale method #1033

Fix race condition in ExponentiallyDecayingReservoir's rescale method #1033

Conversation

ho3rexqj commented Dec 30, 2016

arteam commented Jan 5, 2017

ryantenney commented Jan 5, 2017

arteam commented Jan 5, 2017 • edited Loading

arteam commented Jan 9, 2017

arteam commented Jan 23, 2017

ahadadi commented May 1, 2017

arteam commented Jan 5, 2017 •

edited

Loading