Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests] Mongo playground tests failing to connect to host.docker.internal #5274

Closed
radical opened this issue Aug 13, 2024 · 3 comments
Closed
Labels
area-integrations Issues pertaining to Aspire Integrations packages blocked
Milestone

Comments

@radical
Copy link
Member

radical commented Aug 13, 2024

Build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=773448&view=logs&j=acc6f692-388e-5b31-5997-e154da29f5b3&t=59fafa2a-4be0-5f74-c845-0779cc54f4a3

info: Mongo.AppHost.Resources.mongo-mongoexpress[0] 45: 2024-08-12T23:04:36.124121632Z Waiting for host.docker.internal:39875...
fail: Mongo.AppHost.Resources.mongo-mongoexpress[0] 46: 2024-08-12T23:04:36.230718602Z /docker-entrypoint.sh: line 15: host.docker.internal: Name does not resolve
fail: Mongo.AppHost.Resources.mongo-mongoexpress[0] 47: 2024-08-12T23:04:36.234785505Z /docker-entrypoint.sh: line 15: /dev/tcp/host.docker.internal/39875: Invalid argument

info: Mongo.AppHost.Resources.mongo-mongoexpress[0] 48: 2024-08-12T23:04:37.233651564Z Mon Aug 12 23:04:37 UTC 2024 retrying to connect to host.docker.internal:39875 (2/10)
fail: Mongo.AppHost.Resources.mongo-mongoexpress[0] 49: 2024-08-12T23:04:37.235349865Z /docker-entrypoint.sh: line 15: host.docker.internal: Name does not resolve

info: Mongo.AppHost.Resources.mongo[0] 89: 2024-08-12T23:04:37.448420306Z {"t":{"$date":"2024-08-12T23:04:37.448+00:00"},"s":"I",  "c":"NETWORK",  "id":23016,   "ctx":"listener","msg":"Waiting for connections","attr":{"port":27017,"ssl":"off"}}

info: Mongo.AppHost.Resources.mongo-mongoexpress[0] 51: 2024-08-12T23:04:38.237924127Z Mon Aug 12 23:04:38 UTC 2024 retrying to connect to host.docker.internal:39875 (3/10)
fail: Mongo.AppHost.Resources.mongo-mongoexpress[0] 52: 2024-08-12T23:04:38.239519428Z /docker-entrypoint.sh: line 15: host.docker.internal: Name does not resolve

...

fail: Mongo.AppHost.Resources.mongo-mongoexpress[0] 73: 2024-08-12T23:04:45.271880897Z /docker-entrypoint.sh: line 15: /dev/tcp/host.docker.internal/39875: Invalid argument
info: Mongo.AppHost.Resources.mongo-mongoexpress[0] 74: 2024-08-12T23:04:45.270131894Z Mon Aug 12 23:04:45 UTC 2024 retrying to connect to host.docker.internal:39875 (10/10)

.. and then fails with:

Could not connect to database using connectionString: mongodb://host.docker.internal:39875/?directConnection=true"
/app/node_modules/mongodb/lib/sdam/topology.js:285
                const timeoutError = new error_1.MongoServerSelectionError(`Server selection timed out after ${serverSelectionTimeoutMS} ms`, this.description);
                                     ^

MongoServerSelectionError: getaddrinfo ENOTFOUND host.docker.internal
    at Timeout._onTimeout (/app/node_modules/mongodb/lib/sdam/topology.js:285:38)
    at listOnTimeout (node:internal/timers:569:17)
    at process.processTimers (node:internal/timers:512:7) {
  reason: TopologyDescription {
    type: 'Single',
    servers: Map(1) {
      'host.docker.internal:39875' => ServerDescription {
        address: 'host.docker.internal:39875',
        type: 'Unknown',
        hosts: [],
        passives: [],
        arbiters: [],
        tags: {},
        minWireVersion: 0,
         maxWireVersion: 0,
         roundTripTime: -1,
         lastUpdateTime: 628461,
         lastWriteDate: 0,
         error: MongoNetworkError: getaddrinfo ENOTFOUND host.docker.internal
             at connectionFailureError (/app/node_modules/mongodb/lib/cmap/connect.js:387:20)
             at Socket.<anonymous> (/app/node_modules/mongodb/lib/cmap/connect.js:310:22)
             at Object.onceWrapper (node:events:632:26)
             at Socket.emit (node:events:517:28)
             at emitErrorNT (node:internal/streams/destroy:151:8)
             at emitErrorCloseNT (node:internal/streams/destroy:116:3)
             at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
           cause: Error: getaddrinfo ENOTFOUND host.docker.internal
               at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26) {
             errno: -3008,
             code: 'ENOTFOUND',
             syscall: 'getaddrinfo',
             hostname: 'host.docker.internal'
           },
           [Symbol(errorLabels)]: Set(1) { 'ResetPool' }
         },
         topologyVersion: null,
         setName: null,
         setVersion: null,
         electionId: null,
         logicalSessionTimeoutMinutes: null,
         primary: null,
         me: null,
         '$clusterTime': null
       }
     },
     stale: false,
     compatible: true,
     heartbeatFrequencyMS: 10000,
     localThresholdMS: 15,
     setName: null,
     maxElectionId: null,
     maxSetVersion: null,
     commonWireVersion: 0,
     logicalSessionTimeoutMinutes: null
   },
   code: undefined,
   [Symbol(errorLabels)]: Set(0) {}
 }
 
 Node.js v18.20.3

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "MongoNetworkError: getaddrinfo ENOTFOUND host.docker.internal",
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=773448
Error message validated: [MongoNetworkError: getaddrinfo ENOTFOUND host.docker.internal]
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 8/13/2024 12:39:46 AM UTC

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication label Aug 13, 2024
@radical radical added blocking-clean-ci Blocking a green CI and removed area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication labels Aug 13, 2024
@radical radical changed the title [tests] Mongo playground tests failing [tests] Mongo playground tests failing to connect to host.docker.internal Aug 13, 2024
@radical radical added area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication and removed blocking-clean-ci Blocking a green CI labels Aug 13, 2024
@radical
Copy link
Member Author

radical commented Aug 13, 2024

cc @eerhardt @sebastienros

radical added a commit that referenced this issue Aug 13, 2024
Use `WaitForText` since some services are really slow to start (CosmosDB emulator takes 45s locally).
Use predicates in LoggerNotificationService since some texts are more complex than just `Contains`. For instance CosmonDBEmulator uses `Started` as the log when it's ready, but it also uses it many other times in the middle of some lines. Another example us MySql which starts the server twice (temp server).

Disabling mongo tests - #5274

* Improve reliability of `AppHostTests.TestEndpointsReturnOk`

* Fix up usings

* fix Aspire.Playground.Tests build

* Format TestEndpoints

* Remove unnecessary line

* fix build

* Improve mysql test reliability

* Increase host startup timeout

* Disable mongo app in playground tests - #5274

---------

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
Co-authored-by: Ankit Jain <radical@gmail.com>
radical added a commit to radical/aspire that referenced this issue Aug 13, 2024
commit ed4c375
Merge: e676955 20d00e1
Author: Ankit Jain <radical@gmail.com>
Date:   Tue Aug 13 00:50:09 2024 -0400

    Merge remote-tracking branch 'origin/main' into re-enable-playground-tests

commit 20d00e1
Author: Ankit Jain <radical@gmail.com>
Date:   Tue Aug 13 00:49:21 2024 -0400

    [tests] Try to fix random failures on playground tests build (dotnet#5271)

    * [tests] Try to fix random failures on playground tests build

    `Aspire.Playground.Tests` build failed sometimes with errors like:

    `Could not copy "D:\a\_work\1\s\artifacts\obj\CatalogModel\Release\net8.0\CatalogModel.dll" to "D:\a\_work\1\s\artifacts\bin\CatalogModel\Release\net8.0\CatalogModel.dll". Beginning retry 1 in 1000ms. The process cannot access the file 'D:\a\_work\1\s\artifacts\bin\CatalogModel\Release\net8.0\CatalogModel.dll' because it is being used by another process. The file is locked by: "Pdb2Pdb (1628)"`

    The errors are not limited to this file, but can be one from any of the
    playground projects.

    Given:

    1. `Aspire.Playgroud.Tests` references various playground AppHost projects
       directly with `AdditionalProperties="SkipDashboardProjectReference=true"`.
    2. `Aspire.sln` also references these playground projects.

    My thinking is that msbuild ends up treating these projects like
    `TestShop/CatalogModel/CatalogModel.csproj` as two different project
    instances:

    1. One with `SkipDashboardProjectReference=true` (from
       `Aspire.Playground.Tests`)
    2. One without that property, when building the reference directly from
       `Aspire.sln`.

    This patch avoids that entirely by defaulting to
    `SkipDashboardProjectReference=true` on CI, and on helix.

    * Skip dashboard references

    * fix build

commit e676955
Author: Ankit Jain <radical@gmail.com>
Date:   Mon Aug 12 23:33:26 2024 -0400

    [tests] Re-enable playground tests

    .. and don't set `SkipDashboardProjectReference=true` (see
    dotnet#5271).

commit e0e23d3
Author: Ankit Jain <radical@gmail.com>
Date:   Tue Aug 13 00:21:37 2024 -0400

    [ci] Avoid duplicating trx file on helix, and copy it for playground tests also (dotnet#5253)

commit c3fbc5b
Author: Ankit Jain <radical@gmail.com>
Date:   Mon Aug 12 23:31:22 2024 -0400

    Squashed commit of the following:

    commit b24e087
    Merge: ee90475 4f1a352
    Author: Ankit Jain <radical@gmail.com>
    Date:   Mon Aug 12 23:11:22 2024 -0400

        Merge remote-tracking branch 'origin/main' into fix-playground-tests-build

        # Conflicts:
        #	tests/Aspire.Playground.Tests/Aspire.Playground.Tests.csproj

    commit ee90475
    Author: Ankit Jain <radical@gmail.com>
    Date:   Mon Aug 12 18:02:05 2024 -0400

        fix build

    commit e10b215
    Author: Ankit Jain <radical@gmail.com>
    Date:   Mon Aug 12 17:45:36 2024 -0400

        Skip dashboard references

    commit cf31acc
    Author: Ankit Jain <radical@gmail.com>
    Date:   Mon Aug 12 17:02:56 2024 -0400

        [tests] Try to fix random failures on playground tests build

        `Aspire.Playground.Tests` build failed sometimes with errors like:

        `Could not copy "D:\a\_work\1\s\artifacts\obj\CatalogModel\Release\net8.0\CatalogModel.dll" to "D:\a\_work\1\s\artifacts\bin\CatalogModel\Release\net8.0\CatalogModel.dll". Beginning retry 1 in 1000ms. The process cannot access the file 'D:\a\_work\1\s\artifacts\bin\CatalogModel\Release\net8.0\CatalogModel.dll' because it is being used by another process. The file is locked by: "Pdb2Pdb (1628)"`

        The errors are not limited to this file, but can be one from any of the
        playground projects.

        Given:

        1. `Aspire.Playgroud.Tests` references various playground AppHost projects
           directly with `AdditionalProperties="SkipDashboardProjectReference=true"`.
        2. `Aspire.sln` also references these playground projects.

        My thinking is that msbuild ends up treating these projects like
        `TestShop/CatalogModel/CatalogModel.csproj` as two different project
        instances:

        1. One with `SkipDashboardProjectReference=true` (from
           `Aspire.Playground.Tests`)
        2. One without that property, when building the reference directly from
           `Aspire.sln`.

        This patch avoids that entirely by defaulting to
        `SkipDashboardProjectReference=true` on CI, and on helix.

commit 42cc82d
Author: Sébastien Ros <sebastienros@gmail.com>
Date:   Mon Aug 12 20:24:22 2024 -0700

    Improve reliability of `AppHostTests.TestEndpointsReturnOk` (dotnet#5251)

    Use `WaitForText` since some services are really slow to start (CosmosDB emulator takes 45s locally).
    Use predicates in LoggerNotificationService since some texts are more complex than just `Contains`. For instance CosmonDBEmulator uses `Started` as the log when it's ready, but it also uses it many other times in the middle of some lines. Another example us MySql which starts the server twice (temp server).

    Disabling mongo tests - dotnet#5274

    * Improve reliability of `AppHostTests.TestEndpointsReturnOk`

    * Fix up usings

    * fix Aspire.Playground.Tests build

    * Format TestEndpoints

    * Remove unnecessary line

    * fix build

    * Improve mysql test reliability

    * Increase host startup timeout

    * Disable mongo app in playground tests - dotnet#5274

    ---------

    Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
    Co-authored-by: Ankit Jain <radical@gmail.com>

commit 990d6d0
Author: Ankit Jain <radical@gmail.com>
Date:   Mon Aug 12 23:12:54 2024 -0400

    playground.BrowserTelemetry: avoid using npm on ci/windows (dotnet#5269)

commit 4f1a352
Author: Ankit Jain <radical@gmail.com>
Date:   Mon Aug 12 22:38:17 2024 -0400

    [ci] Disable failing Aspire.Playground.Tests (dotnet#5273)

    * [ci] Disable failing Aspire.Playground.Tests

    There are two failures hitting CI right now:

    1. `Could not copy "D:\a\_work\1\s\artifacts\obj\CatalogModel\Release\net8.0\CatalogModel.dll" to "D:\a\_work\1\s\artifacts\bin\CatalogModel\Release\net8.0\CatalogModel.dll". Beginning retry 1 in 1000ms. The process cannot access the file 'D:\a\_work\1\s\artifacts\bin\CatalogModel\Release\net8.0\CatalogModel.dll' because it is being used by another process. The file is locked by: "Pdb2Pdb (1628)"`

        - Exceptions like this when building playground apps.
        - Waiting on dotnet#5271 which might
          be a fix.

    2. Individual playground tests failing
        - Waiting on dotnet#5251

    This PR disable the tests completely to get the CI in a better state,
    and can be re-enabled once the aforementioned issues are fixed.

# Conflicts:
#	tests/Aspire.Playground.Tests/Aspire.Playground.Tests.csproj
@eerhardt
Copy link
Member

I believe this is because we don't have Docker Desktop installed on the CI / Helix machines. Instead we have the "vanilla Docker" installed. See https://github.com/microsoft/usvc/issues/164 for more info.

@davidfowl davidfowl added area-integrations Issues pertaining to Aspire Integrations packages and removed area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication labels Sep 7, 2024
@davidfowl
Copy link
Member

Fixed by #5584

@davidfowl davidfowl added this to the 9.0 milestone Sep 12, 2024
@davidfowl davidfowl added the bug label Oct 1, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Oct 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-integrations Issues pertaining to Aspire Integrations packages blocked
Projects
None yet
Development

No branches or pull requests

3 participants