Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHE-1143: fixed synchronization issue during removal of several projects #1343

Merged
merged 1 commit into from
May 25, 2016
Merged

Conversation

dkulieshov
Copy link

@dkulieshov dkulieshov commented May 24, 2016

_1 Upvote_ Hi,

Recently I was working on some bugs in Eclipse Che project management infrastructure. The report itself looks like:

When we delete several projects simultaneously (in the same running workspace with project api):

  1. We get 204 status on each request however projects are not deleted
  2. In some cases we get 500 status combined with java.util.ConcurrentModificationException

The first point is a separate question and there are several ideological problems that should be solved first. As far as I know, the discussion is not yet complete.

And for the second item I created a dedicated test to cover the bug. Though the bug is not reproduced anymore most likely thanks to this commit, I've spotted another problem. We rarely get exception:

java.lang.AssertionError: Error: Comparison method violates its general contract! expected [204] but found [500]
        at org.testng.Assert.fail(Assert.java:94)
        at org.testng.Assert.failNotEquals(Assert.java:494)
        at org.testng.Assert.assertEquals(Assert.java:123)
        at org.testng.Assert.assertEquals(Assert.java:370)
        at org.eclipse.che.api.project.server.ProjectServiceTest.testDeleteProjectsConcurrently(ProjectServiceTest.java:981)

when we try to simultaneously remove 100 projects (in 100 threads), while it is always reproduced for 1k+ threads.

I assume that the problem is the following. At some point of project deletion we are to get its filesystem parent's sorted children list, the process is a set of two operations: get and sort the list. These operations are not synchronized so sometimes it happens to change somehow state of children collections by another thread. This results in a following exception:

java.lang.IllegalArgumentException: Comparison method violates its general contract!
    at java.util.ComparableTimSort.mergeHi(ComparableTimSort.java:835)
    at java.util.ComparableTimSort.mergeAt(ComparableTimSort.java:453)
    at java.util.ComparableTimSort.mergeForceCollapse(ComparableTimSort.java:392)
    at java.util.ComparableTimSort.sort(ComparableTimSort.java:191)
    at java.util.ComparableTimSort.sort(ComparableTimSort.java:146)
    at java.util.Arrays.sort(Arrays.java:472)
    at java.util.Collections.sort(Collections.java:155)

That basically happens when arrays of children that are to be merged are indeed not sorted (because of impact of the other thread: there are changes of some elements' state or elements themselves after they are already sorted) or elements are sorted but children's compareTo method cannot fulfill its contract (e.g. does not provide transitivity).

My suggestion is to synchronize those two operations, which should not have great impact on performance (taking into account the kind of operations that we have here) and at the same time we will not impact general logic of such a low level component as file system.

@TylerJewell
Copy link

I like the approach. Welcome back @dkuleshov. Are you a contributor to the eclipse foundation? Have you signed the cla? The signed off check failed.

@gazarenkov
Copy link
Contributor

+1
As I understand and that's what we discussed with @evoevodin the big problem is that operations have not been syched.

And yes great approach to explain

… projects

Signed-off-by: Dmitry Kuleshov <dkuleshov@codenvy.com>
@dkulieshov
Copy link
Author

@TylerJewell It's nice to be back 😄. I'm not a contributor yet, just a member. The CLA is signed but there was a git misconfiguration on my side. Looks like now the signed off check is okay.

IntStream.range(0, threadNumber).forEach(
i -> {
futures.add(executor.submit(() -> {
countDownLatch.countDown();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, it seems that you don't have to countDown() each time, you can create a CountDownLatch with 1 as count and perform .countDown() once after forEach.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that I got your idea, could you explain more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea: await 100 times, count down once.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider usage of barrier here instead of countdown latch

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One major difference is that CyclicBarrier takes an (optional) Runnable task which is run once the common barrier condition is met.

It also allows you to get the number of clients waiting at the barrier and the number required to trigger the barrier. Once triggered the barrier is reset and can be used again.

For simple use cases - services starting etc... a CountdownLatch is fine. A CyclicBarrier is useful for more complex co-ordination tasks. An example of such a thing would be parallel computation - where multiple subtasks are involved in the computation - kind of like MapReduce.

http://stackoverflow.com/questions/4168772/java-concurrency-countdown-latch-vs-cyclic-barrier

There is no need to run Runnable
There are no cycles
So please could you make me understand what is the point of using CyclicBarrier instead?

@codenvy-ci
Copy link

@vparfonov vparfonov merged commit 0adad4a into eclipse-che:master May 25, 2016
@dkulieshov dkulieshov deleted the CHE-1143 branch May 25, 2016 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants