Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot is flaky #9115

Open
ashking94 opened this issue Aug 4, 2023 · 8 comments · Fixed by #10379
Open
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run

Comments

@ashking94
Copy link
Member

Describe the bug
org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot test is flaky on main branch. I ran the test on loop and it failed on the 15th iteration itself.

To Reproduce
The same seed is not always reproducing the failure. To reproduce, kindly run the test on loop and wait for the test to fail.

Expected behavior
The test should pass.

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Jenkins build failure link - https://build.ci.opensearch.org/job/gradle-check/21871/

@ashking94 ashking94 added bug Something isn't working untriaged labels Aug 4, 2023
@ashking94
Copy link
Member Author

@kasundra07 @harishbhakuni21 fyi

@sachinpkale
Copy link
Member

Not able to reproduce failure in local even after 1000 attempts. Closing

@sohami
Copy link
Collaborator

sohami commented Sep 21, 2023

Reopening this as again seeing this test failing:

Ref CI: https://build.ci.opensearch.org/job/gradle-check/25984/

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot" -Dtests.seed=5A77171FC14EEBF7 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=vi -Dtests.timezone=PRC -Druntime.java=20
java.lang.AssertionError: 
Expected: is <7>
     but: was <4>
	at __randomizedtesting.SeedInfo.seed([5A77171FC14EEBF7:5F34FA1617FAB2A9]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:964)
	at org.junit.Assert.assertThat(Assert.java:930)
	at org.opensearch.snapshots.AbstractSnapshotIntegTestCase.createFullSnapshot(AbstractSnapshotIntegTestCase.java:489)
	at org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot(DeleteSnapshotIT.java:85)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1623)

@sohami sohami reopened this Sep 21, 2023
@sohami
Copy link
Collaborator

sohami commented Sep 21, 2023

@harishbhakuni Can you take a look at this ?

@andrross
Copy link
Member

andrross commented Oct 4, 2023

I can get this to fail every time with the following seed:

./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot" -Dtests.seed=4CD3155D4F1C1A9F
java.lang.AssertionError: java.lang.IllegalArgumentException: Provided Lock Name metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock is not Valid.
	at __randomizedtesting.SeedInfo.seed([4CD3155D4F1C1A9F]:0)
	at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$37(BlobStoreRepository.java:1627)
	at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Provided Lock Name metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock is not Valid.
	at org.opensearch.index.store.lockmanager.FileLockInfo$LockFileUtils.getAcquirerIdFromLock(FileLockInfo.java:103)
	at org.opensearch.index.store.lockmanager.FileLockInfo.lambda$getLockForAcquirer$0(FileLockInfo.java:59)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176)
	at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
	at org.opensearch.index.store.lockmanager.FileLockInfo.getLockForAcquirer(FileLockInfo.java:60)
	at org.opensearch.index.store.lockmanager.RemoteStoreMetadataLockManager.release(RemoteStoreMetadataLockManager.java:65)
	at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$37(BlobStoreRepository.java:1590)
	... 7 more

@harishbhakuni
Copy link
Contributor

harishbhakuni commented Oct 4, 2023

> metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock

This issue is fixed with this PR: #10217

I can get this to fail every time with the following seed:

./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot" -Dtests.seed=4CD3155D4F1C1A9F
java.lang.AssertionError: java.lang.IllegalArgumentException: Provided Lock Name metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock is not Valid.
	at __randomizedtesting.SeedInfo.seed([4CD3155D4F1C1A9F]:0)
	at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$37(BlobStoreRepository.java:1627)
	at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Provided Lock Name metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock is not Valid.
	at org.opensearch.index.store.lockmanager.FileLockInfo$LockFileUtils.getAcquirerIdFromLock(FileLockInfo.java:103)
	at org.opensearch.index.store.lockmanager.FileLockInfo.lambda$getLockForAcquirer$0(FileLockInfo.java:59)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176)
	at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
	at org.opensearch.index.store.lockmanager.FileLockInfo.getLockForAcquirer(FileLockInfo.java:60)
	at org.opensearch.index.store.lockmanager.RemoteStoreMetadataLockManager.release(RemoteStoreMetadataLockManager.java:65)
	at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$37(BlobStoreRepository.java:1590)
	... 7 more

This one i didn't see before.. some uuid generation issue looks like. let me check this one.

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7]
Looks like this still might be an issue, reopening so it is investigated

@peternied
Copy link
Member

From the other issue: https://build.ci.opensearch.org/job/gradle-check/37349/testReport/

java.lang.AssertionError: 
Expected: is <9>
     but: was <8>
	at __randomizedtesting.SeedInfo.seed([30BCA240DC8694B0:35FF4F490A32CDEE]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:964)
	at org.junit.Assert.assertThat(Assert.java:930)
	at org.opensearch.snapshots.AbstractSnapshotIntegTestCase.createFullSnapshot(AbstractSnapshotIntegTestCase.java:497)
	at org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot(DeleteSnapshotIT.java:92)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot" -Dtests.seed=30BCA240DC8694B0 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=fr-CA -Dtests.timezone=Antarctica/Vostok -Druntime.java=21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run
Projects
None yet
7 participants