Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ci): increase test timeouts #656 #713

Merged
merged 5 commits into from
Mar 25, 2021

Conversation

petermetz
Copy link
Contributor

Fixes #656

Signed-off-by: Peter Somogyvari peter.somogyvari@accenture.com

@petermetz petermetz added the bug Something isn't working label Mar 23, 2021
@petermetz petermetz force-pushed the fix-656 branch 3 times, most recently from 4c96714 to eeb731a Compare March 24, 2021 02:17
@petermetz petermetz changed the title fix(ci): increase timeouts #656 fix(ci): increase test timeouts #656 Mar 24, 2021
@@ -34,7 +34,7 @@ jobs:
# experimental: true

steps:
# FIXME: These do not work on mac OS as of 2020-12-09
# FIzXME: These do not work on mac OS as of 2020-12-09
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo creep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/facepalm cheers!

Copy link
Contributor

@jonathan-m-hamilton jonathan-m-hamilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Bumping up the test timesouts to a full hour
becauase under heavy load the GHA runner seems
to be extremely slow, meaning that the fabric
tests can take longer than half an hour each
despite the fact that these usually take about
5 minutes or less even on the slow GHA runners.

Fixes hyperledger-cacti#656

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
…ledger-cacti#656

Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

This change makes it so that the pullImage(...) method of the Containers
utility class will now - by default - retry 6 times if the docker
image pulling has failed. The internval between retries is
increasing exponentially (power of two) starting from one
second as the delay then proceeding to be 2^6 seconds
for the final retry (which if also fails then an AbortError
is thrown by the underlying pRetry library that is powering
the retry mechanism.)

For reference, here is a randomly failed CI test execution
where the logs show that DockerHub is randomly in-
accessible over the network and that's another thing that
makes our tests flaky, hence this commit to fix this.

https://github.com/hyperledger/cactus/runs/2178802580?check_suite_focus=true#step:8:2448

In case that link goes dead in the future, here's also the actual logs:

not ok 60 - packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts # time=25389.665ms
  ---
  env:
    TS_NODE_COMPILER_OPTIONS: '{"jsx":"react"}'
  file: packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts
  timeout: 1800000
  command: /opt/hostedtoolcache/node/12.13.0/x64/bin/node
  args:
    - -r
    - /home/runner/work/cactus/cactus/node_modules/ts-node/register/index.js
    - --max-old-space-size=4096
    - packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts
  stdio:
    - 0
    - pipe
    - 2
  cwd: /home/runner/work/cactus/cactus
  exitCode: 1
  ...
{
    # NodeJS API server + Rust plugin work together
    [2021-03-23T20:45:51.458Z] INFO (VaultTestServer): Created VaultTestServer OK. Image FQN: vault:1.6.1
    not ok 1 Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
      ---
        operator: error
        at: bound (/home/runner/work/cactus/cactus/node_modules/onetime/index.js:30:12)
        stack: |-
          Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
              at /home/runner/work/cactus/cactus/packages/cactus-test-tooling/node_modules/docker-modem/lib/modem.js:301:17
              at IncomingMessage.<anonymous> (/home/runner/work/cactus/cactus/packages/cactus-test-tooling/node_modules/docker-modem/lib/modem.js:328:9)
              at IncomingMessage.emit (events.js:215:7)
              at endReadableNT (_stream_readable.js:1183:12)
              at processTicksAndRejections (internal/process/task_queues.js:80:21)
      ...

    Bail out! Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
}
Bail out! Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

An attempt to fix the mysterious error in the CI
that can be seen at the bottom.

Based off of the advice of a fellow internet user as seen here:
https://stackoverflow.com/a/61789467

No idea if this will fix the particular error that we
are trying to fix or not, but we have to try. The
underlying issue seems to be a bug in npm itself,
but knowing that doesn't disappear the need to
find a workaround so here we go...

Error logs and link:
----------------------------

Link: https://github.com/hyperledger/cactus/runs/2179881505?check_suite_focus=true#step:5:8

Logs:

Run npm ci
  npm ci
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME_8.0.275_x64: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME_8_0_275_X64: /opt/hostedtoolcache/jdk/8.0.275/x64
npm ERR! cb() never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://npm.community>

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

An attempt to fix the mysterious issue with npm ci

Based on a true story:
https://stackoverflow.com/a/15483897

CI failure logs: https://github.com/hyperledger/cactus/runs/2179881505?check_suite_focus=true#step:5:8

Logs
------

 npm ci
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME_8.0.275_x64: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME_8_0_275_X64: /opt/hostedtoolcache/jdk/8.0.275/x64
npm ERR! cb() never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://npm.community>

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
…ger-cacti#656

This is yet another attempt at potentially fixing all the remaining CI
flakes that only happen on the GitHub Action runners but never on
developer machines.

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
@petermetz petermetz merged commit af9f851 into hyperledger-cacti:main Mar 25, 2021
@petermetz petermetz deleted the fix-656 branch March 26, 2021 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix(api-client): default consortium provider timeout test flake
4 participants