Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.remotestore.RemoteStoreStatsIT.testStatsOnShardUnassigned is flaky #9036

Closed
ashking94 opened this issue Aug 1, 2023 · 3 comments · Fixed by #9057
Closed
Labels
bug Something isn't working Storage:Durability Issues and PRs related to the durability framework Storage Issues and PRs relating to data and metadata storage untriaged

Comments

@ashking94
Copy link
Member

Describe the bug
org.opensearch.remotestore.RemoteStoreStatsIT.testStatsOnShardUnassigned test is flaky on main branch. Encountered this failure during PR build of #8758.

To Reproduce
REPRODUCE WITH:

./gradlew ':server:internalClusterTest' --tests "org.opensearch.remotestore.RemoteStoreStatsIT.testStatsOnShardUnassigned" -Dtests.seed=1EE76B3CC8051A6 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=hu-HU -Dtests.timezone=Africa/Bissau -Druntime.java=20

Expected behavior
Test should pass.

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Jenkins - https://build.ci.opensearch.org/job/gradle-check/21584/testReport/junit/org.opensearch.remotestore/RemoteStoreStatsIT/testStatsOnShardUnassigned/

@ashking94 ashking94 added bug Something isn't working untriaged labels Aug 1, 2023
@ashking94
Copy link
Member Author

@shourya035 fyi

@ashking94
Copy link
Member Author

Analysing a bit on the failure, it looks like the node count is still 3 even after the node has been stopped.

@ashking94 ashking94 added Storage Issues and PRs relating to data and metadata storage Storage:Durability Issues and PRs related to the durability framework labels Aug 2, 2023
@ashking94
Copy link
Member Author

Seeing failures due to cluster state update task leading to exception -

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.remotestore.RemoteStoreStatsIT.testStatsOnShardUnassigned" -Dtests.seed=5330C7A06AF381B5 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ms -Dtests.timezone=Africa/Nouakchott -Druntime.java=20

org.opensearch.remotestore.RemoteStoreStatsIT > testStatsOnShardUnassigned FAILED
    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=70, name=opensearch[node_t1][clusterApplierService#updateTask][T#1], state=RUNNABLE, group=TGRP-RemoteStoreStatsIT]
        at __randomizedtesting.SeedInfo.seed([5330C7A06AF381B5:A5FA04C1C02A5874]:0)

        Caused by:
        java.lang.AssertionError: a started primary with non-pending operation term must be in primary mode [remote-store-test-idx-1][0], node[3dgi0k4bQvm6tE_udLNSlA], [P], s[STARTED], a[id=oOnPRo_VQYu46NzG2Kr5GA]
            at __randomizedtesting.SeedInfo.seed([5330C7A06AF381B5]:0)
            at org.opensearch.index.shard.IndexShard.updateShardState(IndexShard.java:735)
            at org.opensearch.indices.cluster.IndicesClusterStateService.updateShard(IndicesClusterStateService.java:714)
            at org.opensearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:654)
            at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:297)
            at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:606)
            at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:593)
            at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:561)
            at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:484)
            at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:186)
            at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849)
            at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282)
            at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
            at java.base/java.lang.Thread.run(Thread.java:1623)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Storage:Durability Issues and PRs related to the durability framework Storage Issues and PRs relating to data and metadata storage untriaged
Projects
None yet
1 participant