CHE-1143: fixed synchronization issue during removal of several projects #1343

dkulieshov · 2016-05-24T11:58:13Z

_1 Upvote_ Hi,

Recently I was working on some bugs in Eclipse Che project management infrastructure. The report itself looks like:

When we delete several projects simultaneously (in the same running workspace with project api):

We get 204 status on each request however projects are not deleted
In some cases we get 500 status combined with java.util.ConcurrentModificationException

The first point is a separate question and there are several ideological problems that should be solved first. As far as I know, the discussion is not yet complete.

And for the second item I created a dedicated test to cover the bug. Though the bug is not reproduced anymore most likely thanks to this commit, I've spotted another problem. We rarely get exception:

java.lang.AssertionError: Error: Comparison method violates its general contract! expected [204] but found [500]
        at org.testng.Assert.fail(Assert.java:94)
        at org.testng.Assert.failNotEquals(Assert.java:494)
        at org.testng.Assert.assertEquals(Assert.java:123)
        at org.testng.Assert.assertEquals(Assert.java:370)
        at org.eclipse.che.api.project.server.ProjectServiceTest.testDeleteProjectsConcurrently(ProjectServiceTest.java:981)

when we try to simultaneously remove 100 projects (in 100 threads), while it is always reproduced for 1k+ threads.

I assume that the problem is the following. At some point of project deletion we are to get its filesystem parent's sorted children list, the process is a set of two operations: get and sort the list. These operations are not synchronized so sometimes it happens to change somehow state of children collections by another thread. This results in a following exception:

java.lang.IllegalArgumentException: Comparison method violates its general contract!
    at java.util.ComparableTimSort.mergeHi(ComparableTimSort.java:835)
    at java.util.ComparableTimSort.mergeAt(ComparableTimSort.java:453)
    at java.util.ComparableTimSort.mergeForceCollapse(ComparableTimSort.java:392)
    at java.util.ComparableTimSort.sort(ComparableTimSort.java:191)
    at java.util.ComparableTimSort.sort(ComparableTimSort.java:146)
    at java.util.Arrays.sort(Arrays.java:472)
    at java.util.Collections.sort(Collections.java:155)

That basically happens when arrays of children that are to be merged are indeed not sorted (because of impact of the other thread: there are changes of some elements' state or elements themselves after they are already sorted) or elements are sorted but children's compareTo method cannot fulfill its contract (e.g. does not provide transitivity).

My suggestion is to synchronize those two operations, which should not have great impact on performance (taking into account the kind of operations that we have here) and at the same time we will not impact general logic of such a low level component as file system.

TylerJewell · 2016-05-24T12:03:13Z

I like the approach. Welcome back @dkuleshov. Are you a contributor to the eclipse foundation? Have you signed the cla? The signed off check failed.

gazarenkov · 2016-05-24T12:15:44Z

+1
As I understand and that's what we discussed with @evoevodin the big problem is that operations have not been syched.

And yes great approach to explain

… projects Signed-off-by: Dmitry Kuleshov <dkuleshov@codenvy.com>

dkulieshov · 2016-05-24T12:48:14Z

@TylerJewell It's nice to be back 😄. I'm not a contributor yet, just a member. The CLA is signed but there was a git misconfiguration on my side. Looks like now the signed off check is okay.

voievodin · 2016-05-24T14:12:59Z

...he-core-api-project/src/test/java/org/eclipse/che/api/project/server/ProjectServiceTest.java

+        IntStream.range(0, threadNumber).forEach(
+                i -> {
+                    futures.add(executor.submit(() -> {
+                        countDownLatch.countDown();


hmm, it seems that you don't have to countDown() each time, you can create a CountDownLatch with 1 as count and perform .countDown() once after forEach.

I'm not sure that I got your idea, could you explain more?

The idea: await 100 times, count down once.

Please consider usage of barrier here instead of countdown latch

One major difference is that CyclicBarrier takes an (optional) Runnable task which is run once the common barrier condition is met.

It also allows you to get the number of clients waiting at the barrier and the number required to trigger the barrier. Once triggered the barrier is reset and can be used again.

For simple use cases - services starting etc... a CountdownLatch is fine. A CyclicBarrier is useful for more complex co-ordination tasks. An example of such a thing would be parallel computation - where multiple subtasks are involved in the computation - kind of like MapReduce.

http://stackoverflow.com/questions/4168772/java-concurrency-countdown-latch-vs-cyclic-barrier

There is no need to run Runnable
There are no cycles
So please could you make me understand what is the point of using CyclicBarrier instead?

codenvy-ci · 2016-05-25T01:53:43Z

Build success. http://ci.codenvy-dev.com/jenkins/job/che-pullrequests-build/694/

CHE-1143: fixed minor synchronization issue during removal of several…

4088de7

… projects Signed-off-by: Dmitry Kuleshov <dkuleshov@codenvy.com>

voievodin reviewed May 24, 2016
View reviewed changes

vparfonov merged commit 0adad4a into eclipse-che:master May 25, 2016

dkulieshov deleted the CHE-1143 branch May 25, 2016 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHE-1143: fixed synchronization issue during removal of several projects #1343

CHE-1143: fixed synchronization issue during removal of several projects #1343

dkulieshov commented May 24, 2016 •

edited by benoitf

Loading

TylerJewell commented May 24, 2016

gazarenkov commented May 24, 2016

dkulieshov commented May 24, 2016

voievodin May 24, 2016

dkulieshov May 24, 2016

voievodin May 24, 2016

garagatyi May 24, 2016

dkulieshov May 24, 2016

codenvy-ci commented May 25, 2016

CHE-1143: fixed synchronization issue during removal of several projects #1343

CHE-1143: fixed synchronization issue during removal of several projects #1343

Conversation

dkulieshov commented May 24, 2016 • edited by benoitf Loading

TylerJewell commented May 24, 2016

gazarenkov commented May 24, 2016

dkulieshov commented May 24, 2016

voievodin May 24, 2016

Choose a reason for hiding this comment

dkulieshov May 24, 2016

Choose a reason for hiding this comment

voievodin May 24, 2016

Choose a reason for hiding this comment

garagatyi May 24, 2016

Choose a reason for hiding this comment

dkulieshov May 24, 2016

Choose a reason for hiding this comment

codenvy-ci commented May 25, 2016

dkulieshov commented May 24, 2016 •

edited by benoitf

Loading