Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test - QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages #5140

Closed
2 tasks
radical opened this issue Aug 1, 2024 · 4 comments
Closed
2 tasks
Labels
area-integrations Issues pertaining to Aspire Integrations packages blocking-clean-ci Blocking a green CI flaky-test testing ☑️ untriaged New issue has not been triaged

Comments

@radical
Copy link
Member

radical commented Aug 1, 2024

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=763013
Build error leg or test failing: Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages(useVolume: True)
Pull request: #5099

Error message
Grpc.Core.RpcException : Status(StatusCode="Cancelled", Detail="Call canceled by the client.", DebugException="System.OperationCanceledException: The operation was canceled.")
---- System.OperationCanceledException : The operation was canceled.

Stack trace
   at Qdrant.Client.QdrantClient.SearchAsync(String collectionName, ReadOnlyMemory`1 vector, Filter filter, SearchParams searchParams, UInt64 limit, UInt64 offset, WithPayloadSelector payloadSelector, WithVectorsSelector vectorsSelector, Nullable`1 scoreThreshold, String vectorName, ReadConsistency readConsistency, ShardKeySelector shardKeySelector, Nullable`1 sparseIndices, Nullable`1 timeout, CancellationToken cancellationToken)
   at Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.<>c__DisplayClass6_1.<<WithDataShouldPersistStateBetweenUsages>b__1>d.MoveNext() in /_/tests/Aspire.Hosting.Qdrant.Tests/QdrantFunctionalTests.cs:line 187
--- End of stack trace from previous location ---
   at Polly.ResiliencePipeline.<>c.<<ExecuteAsync>b__3_0>d.MoveNext()
--- End of stack trace from previous location ---
   at Polly.Outcome`1.GetResultOrRethrow()
   at Polly.ResiliencePipeline.ExecuteAsync(Func`2 callback, CancellationToken cancellationToken)
   at Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages(Boolean useVolume) in /_/tests/Aspire.Hosting.Qdrant.Tests/QdrantFunctionalTests.cs:line 183
   at Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages(Boolean useVolume) in /_/tests/Aspire.Hosting.Qdrant.Tests/QdrantFunctionalTests.cs:line 196
--- End of stack trace from previous location ---
----- Inner Stack Trace -----

var pipeline = new ResiliencePipelineBuilder()
.AddRetry(new() { MaxRetryAttempts = 10, Delay = TimeSpan.FromSeconds(1), ShouldHandle = new PredicateBuilder().Handle<RpcException>() })
.Build();

await pipeline.ExecuteAsync(async token =>
{
var qdrantClient = host.Services.GetRequiredService<QdrantClient>();
var results = await qdrantClient.SearchAsync(CollectionName, s_testVector, limit: 1, cancellationToken: token);
Assert.Collection(results,
r => Assert.Equal("Test", r.Payload["title"].StringValue));
}, cts.Token);

This code is being executed in a Resilience pipeline watching for RpcException.

  • Does it need longer delay or attempts for a longer duration?
  • Also, the retry attempts should be logged so the actual error can be seen

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "",
  "ErrorPattern": "RpcException.*Call canceled by the client",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

cc @eerhardt @sebastienros

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=763013
Error message validated: [RpcException.*Call canceled by the client]
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 8/1/2024 6:49:28 PM UTC

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
@radical radical added the blocking-clean-ci Blocking a green CI label Aug 1, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 1, 2024
@radical radical added area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication testing ☑️ and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 1, 2024
@eerhardt
Copy link
Member

eerhardt commented Aug 1, 2024

Looking at the logs I see:

fail: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      could not determine host address and port for container port	{"Container": {"name":"qdrant-mkbsaegq-59a393fe"}, "Reconciliation": 5, "error": "container '/qdrant-mkbsaegq-59a393fe' is not running: exited"}
fail: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      could not create Endpoint object	{"Container": {"name":"qdrant-mkbsaegq-59a393fe"}, "Reconciliation": 5, "ServiceName": "qdrant-grpc-59a393fe", "Workload": "/qdrant-mkbsaegq-59a393fe", "error": "container '/qdrant-mkbsaegq-59a393fe' is not running: exited"}
fail: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      could not determine host address and port for container port	{"Container": {"name":"qdrant-mkbsaegq-59a393fe"}, "Reconciliation": 5, "error": "container '/qdrant-mkbsaegq-59a393fe' is not running: exited"}
fail: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      could not create Endpoint object	{"Container": {"name":"qdrant-mkbsaegq-59a393fe"}, "Reconciliation": 5, "ServiceName": "qdrant-http-59a393fe", "Workload": "/qdrant-mkbsaegq-59a393fe", "error": "container '/qdrant-

Did the container fail to start? Is there a way to get the containers logs?

@radical
Copy link
Member Author

radical commented Aug 1, 2024

.. and the first one failed to shutdown/deletion(?):

k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'Conflict', response body {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on containers.usvc-dev.developer.microsoft.com \"qdrant-azvxadzz-73d029b3\": the object has been modified; please apply your changes to the latest version and try again","reason":"Conflict","details":{"name":"qdrant-azvxadzz-73d029b3","group":"usvc-dev.developer.microsoft.com","kind":"containers"},"code":409}
   at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
   at k8s.AbstractKubernetes.ICustomObjectsOperations_DeleteClusterCustomObjectWithHttpMessagesAsync[T](String group, String version, String plural, String name, V1DeleteOptions body, Nullable`1 gracePeriodSeconds, Nullable`1 orphanDependents, String propagationPolicy, String dryRun, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
   at k8s.AbstractKubernetes.k8s.ICustomObjectsOperations.DeleteClusterCustomObjectWithHttpMessagesAsync(String group, String version, String plural, String name, V1DeleteOptions body, Nullable`1 gracePeriodSeconds, Nullable`1 orphanDependents, String propagationPolicy, String dryRun, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
   at Aspire.Hosting.Dcp.KubernetesService.<>c__DisplayClass18_0`1.<<DeleteAsync>b__0>d.MoveNext() in /_/src/Aspire.Hosting/Dcp/KubernetesService.cs:line 165
--- End of stack trace from previous location ---
   at Aspire.Hosting.Dcp.KubernetesService.ExecuteWithRetry[TResult](DcpApiOperationType operationType, String resourceType, Func`2 operation, CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/KubernetesService.cs:line 308
   at Aspire.Hosting.Dcp.ApplicationExecutor.DeleteResourcesAsync[RT](String resourceType, CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/ApplicationExecutor.cs:line 1803

re:container-logs, we'll need to add something to explicitly get the logs, or pipe the logs to the logger.

@Alirexaa
Copy link
Contributor

Alirexaa commented Aug 1, 2024

I use ResourceLoggerForwarderService to logger log container logs :))

  private TestDistributedApplicationBuilder CreateDistributedApplicationBuilder()
    {
        var builder = TestDistributedApplicationBuilder.CreateWithTestContainerRegistry();
        builder.Services.AddXunitLogging(testOutputHelper);
        builder.Services.AddHostedService<ResourceLoggerForwarderService>();

        return builder;
    }

@davidfowl davidfowl added area-integrations Issues pertaining to Aspire Integrations packages and removed area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication labels Sep 7, 2024
@joperezr joperezr added the untriaged New issue has not been triaged label Oct 15, 2024
@eerhardt
Copy link
Member

This hasn't occurred in the past month. Closing for now. Please reopen if it occurs again.

@eerhardt eerhardt closed this as not planned Won't fix, can't repro, duplicate, stale Jan 14, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Feb 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-integrations Issues pertaining to Aspire Integrations packages blocking-clean-ci Blocking a green CI flaky-test testing ☑️ untriaged New issue has not been triaged
Projects
None yet
Development

No branches or pull requests

5 participants