Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

[Flake]: Windows integration test timeouts #923

Closed
mr-salty opened this issue Oct 3, 2019 · 7 comments
Closed

[Flake]: Windows integration test timeouts #923

mr-salty opened this issue Oct 3, 2019 · 7 comments
Labels
api: spanner Issues related to the googleapis/google-cloud-cpp-spanner API. priority: p2 Lowest priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@mr-salty
Copy link
Contributor

mr-salty commented Oct 3, 2019

We've been seeing some timeouts in the spanner integration tests on windows. I included one successful run and 3 timeout runs. In the timeout runs I see these two lines in the output.

E1003 19:31:17.781000000  1080 external/com_github_grpc_grpc/src/core/lib/security/credentials/alts/check_gcp_environment_windows.cc:72] CreateFile failed (3).
unknown file: error: SEH exception with code 0xc0000005 thrown in the test body.

If CreateFile failed (source) / unknown file is the issue and not a red herring, it appears this causes some test cases to fail and others to hang indefinitely, until the timeout occurs. So, the CreateFile flakiness is one issue, but the tests should also fail quickly in the event of a failure.

Note that failure1 fails in spanner_client_instance_admin_integration_test, and actually the InstanceAdminClientTest.InstanceConfig test FAILED while presumably InstanceAdminClientTest.InstanceIam was the one that timed out. In the successful cases this test passes in under 2 seconds vs a timeout of 15 minutes.

failure2 and failure3 fail in spanner_client_rpc_failure_threshold_integration_test which only has a single test RpcFailureThresholdTest.ExecuteSqlDeleteErrors. This test typically takes ~2 minutes to run vs the timeout of 15 minutes.


success

INFO: Build completed successfully, 7 total actions
//google/cloud/spanner/integration_tests:spanner_client_client_integration_test PASSED in 88.0s
//google/cloud/spanner/integration_tests:spanner_client_client_stress_test PASSED in 85.2s
//google/cloud/spanner/integration_tests:spanner_client_database_admin_integration_test PASSED in 38.4s
//google/cloud/spanner/integration_tests:spanner_client_instance_admin_integration_test PASSED in 1.8s
//google/cloud/spanner/integration_tests:spanner_client_rpc_failure_threshold_integration_test PASSED in 112.2s
//google/cloud/spanner/samples:spanner_client_samples                    PASSED in 45.1s

failure 1

TIMEOUT: //google/cloud/spanner/integration_tests:spanner_client_instance_admin_integration_test (Summary)
      C:/b/4rkbi76t/execroot/com_github_googleapis_google_cloud_cpp_spanner/bazel-out/x64_windows-fastbuild/testlogs/google/cloud/spanner/integration_tests/spanner_client_instance_admin_integration_test/test.log
INFO: From Testing //google/cloud/spanner/integration_tests:spanner_client_instance_admin_integration_test:
==================== Test output for //google/cloud/spanner/integration_tests:spanner_client_instance_admin_integration_test:
[==========] Running 4 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 3 tests from InstanceAdminClientTest
[ RUN      ] InstanceAdminClientTest.InstanceReadOperations
E1003 19:31:17.781000000  1080 external/com_github_grpc_grpc/src/core/lib/security/credentials/alts/check_gcp_environment_windows.cc:72] CreateFile failed (3).
[       OK ] InstanceAdminClientTest.InstanceReadOperations (285 ms)
[ RUN      ] InstanceAdminClientTest.InstanceConfig
unknown file: error: SEH exception with code 0xc0000005 thrown in the test body.
[  FAILED  ] InstanceAdminClientTest.InstanceConfig (46 ms)
[ RUN      ] InstanceAdminClientTest.InstanceIam
================================================================================
INFO: Elapsed time: 1298.679s, Critical Path: 900.38s
INFO: 8 processes: 8 local.
INFO: Build completed, 1 test FAILED, 7 total actions
//google/cloud/spanner/integration_tests:spanner_client_client_integration_test PASSED in 104.0s
//google/cloud/spanner/integration_tests:spanner_client_client_stress_test PASSED in 87.1s
//google/cloud/spanner/integration_tests:spanner_client_database_admin_integration_test PASSED in 41.5s
//google/cloud/spanner/integration_tests:spanner_client_rpc_failure_threshold_integration_test PASSED in 110.8s
//google/cloud/spanner/samples:spanner_client_samples                    PASSED in 51.4s
//google/cloud/spanner/integration_tests:spanner_client_instance_admin_integration_test TIMEOUT in 900.0s
  C:/b/4rkbi76t/execroot/com_github_googleapis_google_cloud_cpp_spanner/bazel-out/x64_windows-fastbuild/testlogs/google/cloud/spanner/integration_tests/spanner_client_instance_admin_integration_test/test.log

failure 2

TIMEOUT: //google/cloud/spanner/integration_tests:spanner_client_rpc_failure_threshold_integration_test (Summary)
      C:/b/4rkbi76t/execroot/com_github_googleapis_google_cloud_cpp_spanner/bazel-out/x64_windows-fastbuild/testlogs/google/cloud/spanner/integration_tests/spanner_client_rpc_failure_threshold_integration_test/test.log
INFO: From Testing //google/cloud/spanner/integration_tests:spanner_client_rpc_failure_threshold_integration_test:
==================== Test output for //google/cloud/spanner/integration_tests:spanner_client_rpc_failure_threshold_integration_test:
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from RpcFailureThresholdTest
[ RUN      ] RpcFailureThresholdTest.ExecuteSqlDeleteErrors
E1003 18:47:30.819000000  4224 external/com_github_grpc_grpc/src/core/lib/security/credentials/alts/check_gcp_environment_windows.cc:72] CreateFile failed (3).
Creating database [db-081j-3gfm0992dyr5ydq01rxbfg] and table unknown file: error: SEH exception with code 0xc0000005 thrown in SetUp().
Dropping database db-081j-3gfm0992dyr5ydq01rxbfg================================================================================
[1,569 / 1,570] 4 / 6 tests, 1 failed; Testing //google/cloud/spanner/integration_tests:spanner_client_client_integration_test; 2s local
[1,575 / 1,576] 5 / 6 tests, 1 failed; Testing //google/cloud/spanner/integration_tests:spanner_client_client_stress_test; 61s local
INFO: Elapsed time: 1175.162s, Critical Path: 900.34s
INFO: 8 processes: 8 local.
INFO: Build completed, 1 test FAILED, 7 total actions
//google/cloud/spanner/integration_tests:spanner_client_client_integration_test PASSED in 91.3s
//google/cloud/spanner/integration_tests:spanner_client_client_stress_test PASSED in 82.0s
//google/cloud/spanner/integration_tests:spanner_client_database_admin_integration_test PASSED in 37.3s
//google/cloud/spanner/integration_tests:spanner_client_instance_admin_integration_test PASSED in 1.7s
//google/cloud/spanner/samples:spanner_client_samples                    PASSED in 59.1s
//google/cloud/spanner/integration_tests:spanner_client_rpc_failure_threshold_integration_test TIMEOUT in 900.0s
  C:/b/4rkbi76t/execroot/com_github_googleapis_google_cloud_cpp_spanner/bazel-out/x64_windows-fastbuild/testlogs/google/cloud/spanner/integration_tests/spanner_client_rpc_failure_threshold_integration_test/test.log

failure 3

TIMEOUT: //google/cloud/spanner/integration_tests:spanner_client_rpc_failure_threshold_integration_test (Summary)
      C:/b/4rkbi76t/execroot/com_github_googleapis_google_cloud_cpp_spanner/bazel-out/x64_windows-fastbuild/testlogs/google/cloud/spanner/integration_tests/spanner_client_rpc_failure_threshold_integration_test/test.log
INFO: From Testing //google/cloud/spanner/integration_tests:spanner_client_rpc_failure_threshold_integration_test:
==================== Test output for //google/cloud/spanner/integration_tests:spanner_client_rpc_failure_threshold_integration_test:
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from RpcFailureThresholdTest
[ RUN      ] RpcFailureThresholdTest.ExecuteSqlDeleteErrors
E1003 01:43:11.119000000  5968 external/com_github_grpc_grpc/src/core/lib/security/credentials/alts/check_gcp_environment_windows.cc:72] CreateFile failed (3).
Creating database [db-cdxpee59fvyp0iwsua8-sj5fzto] and table unknown file: error: SEH exception with code 0xc0000005 thrown in SetUp().
Dropping database db-cdxpee59fvyp0iwsua8-sj5fzto================================================================================
INFO: Elapsed time: 1201.286s, Critical Path: 900.40s
INFO: 8 processes: 8 local.
INFO: Build completed, 1 test FAILED, 7 total actions
//google/cloud/spanner/integration_tests:spanner_client_client_integration_test PASSED in 110.0s
//google/cloud/spanner/integration_tests:spanner_client_client_stress_test PASSED in 88.2s
//google/cloud/spanner/integration_tests:spanner_client_database_admin_integration_test PASSED in 36.2s
//google/cloud/spanner/integration_tests:spanner_client_instance_admin_integration_test PASSED in 1.6s
//google/cloud/spanner/samples:spanner_client_samples                    PASSED in 61.7s
//google/cloud/spanner/integration_tests:spanner_client_rpc_failure_threshold_integration_test TIMEOUT in 900.0s
  C:/b/4rkbi76t/execroot/com_github_googleapis_google_cloud_cpp_spanner/bazel-out/x64_windows-fastbuild/testlogs/google/cloud/spanner/integration_tests/spanner_client_rpc_failure_threshold_integration_test/test.log
@mr-salty mr-salty added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Oct 3, 2019
@mr-salty
Copy link
Contributor Author

mr-salty commented Oct 4, 2019

From chat, @tmatsuo and @coryan say the failure is a manifestation of grpc/grpc#16872

Can we do anything to make the tests fail (quickly) instead of timing out in this case? If not, then I don't think there's anything we can do except wait for the grpc bug to be fixed.

@coryan coryan added the priority: p2 Lowest priority. Fix may not be included in next release. label Oct 4, 2019
@coryan
Copy link
Contributor

coryan commented Oct 4, 2019

I think the timeout might caused by MSVC opening a dialog when you get a crash and the code was compiled in debug mode. The answer is to compile in release mode (where you just get a crash) or to disable the "just in time" debugger:

https://stackoverflow.com/questions/1893567/how-to-stop-just-in-time-debugging-messages-blocking-a-buildserver

@coryan
Copy link
Contributor

coryan commented Oct 4, 2019

@coryan coryan changed the title Flake: Windows integration test timeouts [Flake]: Windows integration test timeouts Oct 10, 2019
@coryan
Copy link
Contributor

coryan commented Dec 19, 2019

FWIW, my fixes for gRPC on Windows made it to gRPC-1.26.0. I am working on microsoft/vcpkg#9363 which will bring the fixes to our Windows+CMake builds, @scotthart is working on upgrading the Windows+Bazel builds 1.26.0 too.

@google-cloud-label-sync google-cloud-label-sync bot added the api: spanner Issues related to the googleapis/google-cloud-cpp-spanner API. label Jan 29, 2020
@coryan
Copy link
Contributor

coryan commented Feb 1, 2020

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Mar 31, 2020
@coryan coryan removed the 🚨 This issue needs some love. label Apr 1, 2020
@coryan
Copy link
Contributor

coryan commented Apr 1, 2020

The problem does not seem to repeat since we moved to gRPC-1.26.x, if there are no repeats by 2020-05-01 we should close this bug.

@coryan
Copy link
Contributor

coryan commented Apr 1, 2020

Actually we moved to 1.26.x on #1234, and the bug reported above cannot happen on that version of gRPC (there are no CreateFile() calls).
I think we can close this.

@coryan coryan closed this as completed Apr 1, 2020
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Apr 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api: spanner Issues related to the googleapis/google-cloud-cpp-spanner API. priority: p2 Lowest priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants