-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI][C++] Fix arrow-s3fs-test timeouts on macOS C++ job #40410
Comments
When running this test locally on my non-AMD64 Mac, I actually get three failures:
I see in this job we run ctest with,
My hunch is that it's possible the test is failing and ctest is retrying. My local machine is fast enough that it can get 3 repeats in before the timeout but it's possible GitHub's runner is slow enough that the timeout happens first. Another thing I noticed is that, when ctest automatically re-attempts the test as requested, it still prints "attempt 1/1" each time:
|
I reverted my minio and mc binaries to the ones we pin for the tests and the above failures go away so I don't think that's what's going on. Though the above errors may be something to be aware of we ever need to upgrade minio/mc. |
Relevant link -> #34671 |
I haven't been able to reproduce locally (without modifying anything) but I have been able to reproduce once on CI with a stripped-down workflow in this run https://github.com/amoeba/arrow/actions/runs/8194916281/job/22416431654 so I'm going to try debugging that with tmate. |
Good luck with that, my attempts at reproducing while logging in using tmate actually failed here: |
### Rationale for this change We can use GitHub hosted M1 macOS runner. ### What changes are included in this PR? * Add a job on macos-14 * Update expected L2 CPU cache range to 32KiB-12MiB from 32KiB-8MiB because M1 macOS runner has 12MiB * Disable arrow-s3fs-test for now. It'll be fixed by GH-40410 ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #40082 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
I'm having success reproducing the timeout by SSH'ing into GHA so this has now turned into me figuring out how to get useful information out of the hanging test with dtrace. A did a bit of searching to find out how to get thread stack traces from a running process and ended up trying But so far, it produces this more often than not, which is interesting,
|
More easily perhaps, you can use |
Thanks. A
|
Thanks a lot @amoeba . It appears therefore that it is timing out when trying to join a thread using Edit: at second sight, the sentence above is incorrect. There is a thread pending on |
It would be nice if you could try again to catch another backtrace, so that we can see if it's always the same test timing out. If it is, we could just disable that test on macOS... |
Even better: can you try getting a traceback later? It seems DNS resolution timeout on macOS is 30 seconds. |
I've tried reenabling it but it still times out |
@amoeba Could you make progress on the diagnosis here? Otherwise, perhaps we can simply skip the |
Yeah, I can pick this back up this week. Sorry I dropped it! |
I spent some time Saturday trying to reproduce the timeout like I'd done before and couldn't until I was about to head out so I'll keep trying this week. |
I think I was able to reproduce the hang this morning and I collected 6 backtraces spread out of 1-2min that were all essentially identical: bt all output(lldb) bt all
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x000000019213bee8 libsystem_kernel.dylib`__select + 8
frame #1: 0x00000001a9c3cc64 libcurl.4.dylib`Curl_poll + 516
frame #2: 0x00000001a9c3501c libcurl.4.dylib`multi_wait + 640
frame #3: 0x00000001a9c0eda8 libcurl.4.dylib`curl_easy_perform + 268
frame #4: 0x0000000105b7f720 libaws-cpp-sdk-core.dylib`Aws::Http::CurlHttpClient::MakeRequest(std::__1::shared_ptr<Aws::Http::HttpRequest> const&, Aws::Utils::RateLimits::RateLimiterInterface*, Aws::Utils::RateLimits::RateLimiterInterface*) const + 3524
frame #5: 0x0000000105b425a8 libaws-cpp-sdk-core.dylib`std::__1::shared_ptr<Aws::Http::HttpResponse> smithy::components::tracing::TracingUtils::MakeCallWithTiming<std::__1::shared_ptr<Aws::Http::HttpResponse>>(std::__1::function<std::__1::shared_ptr<Aws::Http::HttpResponse> ()>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, smithy::components::tracing::Meter const&, std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>>&&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 84
frame #6: 0x0000000105b4e24c libaws-cpp-sdk-core.dylib`Aws::Client::AWSClient::AttemptOneRequest(std::__1::shared_ptr<Aws::Http::HttpRequest> const&, Aws::AmazonWebServiceRequest const&, char const*, char const*, char const*) const + 1772
frame #7: 0x0000000105b4c3e8 libaws-cpp-sdk-core.dylib`Aws::Client::AWSClient::AttemptExhaustively(Aws::Http::URI const&, Aws::AmazonWebServiceRequest const&, Aws::Http::HttpMethod, char const*, char const*, char const*) const + 948
frame #8: 0x0000000105b55a0c libaws-cpp-sdk-core.dylib`Aws::Client::AWSXMLClient::MakeRequest(Aws::Http::URI const&, Aws::AmazonWebServiceRequest const&, Aws::Http::HttpMethod, char const*, char const*, char const*) const + 64
frame #9: 0x0000000105eec284 libaws-cpp-sdk-s3.dylib`std::__1::__function::__func<Aws::S3::S3Client::PutObject(Aws::S3::Model::PutObjectRequest const&) const::$_105, std::__1::allocator<Aws::S3::S3Client::PutObject(Aws::S3::Model::PutObjectRequest const&) const::$_105>, Aws::Utils::Outcome<Aws::S3::Model::PutObjectResult, Aws::S3::S3Error> ()>::operator()() + 1624
frame #10: 0x0000000105e94050 libaws-cpp-sdk-s3.dylib`Aws::Utils::Outcome<Aws::S3::Model::PutObjectResult, Aws::S3::S3Error> smithy::components::tracing::TracingUtils::MakeCallWithTiming<Aws::Utils::Outcome<Aws::S3::Model::PutObjectResult, Aws::S3::S3Error>>(std::__1::function<Aws::Utils::Outcome<Aws::S3::Model::PutObjectResult, Aws::S3::S3Error> ()>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, smithy::components::tracing::Meter const&, std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>>&&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 84
frame #11: 0x0000000105e936c0 libaws-cpp-sdk-s3.dylib`Aws::S3::S3Client::PutObject(Aws::S3::Model::PutObjectRequest const&) const + 1512
frame #12: 0x0000000110ef0060 libarrow.1800.dylib`arrow::fs::S3FileSystem::Impl::CreateEmptyDir(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string_view<char, std::__1::char_traits<char>>) + 524
frame #13: 0x0000000110eeecf4 libarrow.1800.dylib`arrow::fs::S3FileSystem::CreateDir(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, bool) + 1820
frame #14: 0x00000001046e0a30 arrow-s3fs-test`arrow::fs::FileSystem::CreateDir(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 56
frame #15: 0x00000001046e349c arrow-s3fs-test`arrow::fs::TestS3FS_CreateDir_Test::TestBody() + 4432
frame #16: 0x00000001060dfad8 libarrow_gtestd.1.11.0.dylib`void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 132
frame #17: 0x000000010609d8b8 libarrow_gtestd.1.11.0.dylib`void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 96
frame #18: 0x000000010609d808 libarrow_gtestd.1.11.0.dylib`testing::Test::Run() + 192
frame #19: 0x000000010609e8ac libarrow_gtestd.1.11.0.dylib`testing::TestInfo::Run() + 244
frame #20: 0x000000010609f99c libarrow_gtestd.1.11.0.dylib`testing::TestSuite::Run() + 276
frame #21: 0x00000001060ad70c libarrow_gtestd.1.11.0.dylib`testing::internal::UnitTestImpl::RunAllTests() + 1008
frame #22: 0x00000001060e6d6c libarrow_gtestd.1.11.0.dylib`bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 132
frame #23: 0x00000001060ad0e0 libarrow_gtestd.1.11.0.dylib`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 96
frame #24: 0x00000001060acfcc libarrow_gtestd.1.11.0.dylib`testing::UnitTest::Run() + 216
frame #25: 0x0000000104dabef0 libarrow_gtest_maind.1.11.0.dylib`RUN_ALL_TESTS() + 16
frame #26: 0x0000000104dabed4 libarrow_gtest_maind.1.11.0.dylib`main + 76
frame #27: 0x0000000191de7154 dyld`start + 2476
thread #2
frame #0: 0x0000000192132aa4 libsystem_kernel.dylib`__workq_kernreturn + 8
thread #3, name = 'AwsEventLoop 1'
frame #0: 0x0000000192136f40 libsystem_kernel.dylib`kevent + 8
frame #1: 0x0000000105cea0d4 libaws-c-io.1.0.0.dylib`aws_event_loop_thread + 412
frame #2: 0x0000000105d4cd18 libaws-c-common.1.dylib`thread_fn + 340
frame #3: 0x0000000192171f94 libsystem_pthread.dylib`_pthread_start + 136
thread #4
frame #0: 0x00000001921345ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000019217255c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x0000000192097b14 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
frame #3: 0x000000011135129c libarrow.1800.dylib`arrow::internal::WorkerLoop(std::__1::shared_ptr<arrow::internal::ThreadPool::State>, std::__1::__list_iterator<std::__1::thread, void*>) + 1040
frame #4: 0x0000000111350e54 libarrow.1800.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6::operator()() const + 88
frame #5: 0x0000000111350dc8 libarrow.1800.dylib`decltype(std::declval<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>()()) std::__1::__invoke[abi:ue170006]<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6&&) + 24
frame #6: 0x0000000111350da4 libarrow.1800.dylib`void std::__1::__thread_execute[abi:ue170006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>&, std::__1::__tuple_indices<>) + 28
frame #7: 0x0000000111350a1c libarrow.1800.dylib`void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>>(void*) + 84
frame #8: 0x0000000192171f94 libsystem_pthread.dylib`_pthread_start + 136
thread #5
frame #0: 0x00000001921345ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000019217255c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x0000000192097b14 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
frame #3: 0x000000011135129c libarrow.1800.dylib`arrow::internal::WorkerLoop(std::__1::shared_ptr<arrow::internal::ThreadPool::State>, std::__1::__list_iterator<std::__1::thread, void*>) + 1040
frame #4: 0x0000000111350e54 libarrow.1800.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6::operator()() const + 88
frame #5: 0x0000000111350dc8 libarrow.1800.dylib`decltype(std::declval<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>()()) std::__1::__invoke[abi:ue170006]<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6&&) + 24
frame #6: 0x0000000111350da4 libarrow.1800.dylib`void std::__1::__thread_execute[abi:ue170006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>&, std::__1::__tuple_indices<>) + 28
frame #7: 0x0000000111350a1c libarrow.1800.dylib`void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>>(void*) + 84
frame #8: 0x0000000192171f94 libsystem_pthread.dylib`_pthread_start + 136
thread #6
frame #0: 0x00000001921345ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000019217255c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x0000000192097b14 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
frame #3: 0x000000011135129c libarrow.1800.dylib`arrow::internal::WorkerLoop(std::__1::shared_ptr<arrow::internal::ThreadPool::State>, std::__1::__list_iterator<std::__1::thread, void*>) + 1040
frame #4: 0x0000000111350e54 libarrow.1800.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6::operator()() const + 88
frame #5: 0x0000000111350dc8 libarrow.1800.dylib`decltype(std::declval<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>()()) std::__1::__invoke[abi:ue170006]<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6&&) + 24
frame #6: 0x0000000111350da4 libarrow.1800.dylib`void std::__1::__thread_execute[abi:ue170006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>&, std::__1::__tuple_indices<>) + 28
frame #7: 0x0000000111350a1c libarrow.1800.dylib`void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>>(void*) + 84
frame #8: 0x0000000192171f94 libsystem_pthread.dylib`_pthread_start + 136
thread #7
frame #0: 0x00000001921345ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000019217255c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x0000000192097b14 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
frame #3: 0x000000011135129c libarrow.1800.dylib`arrow::internal::WorkerLoop(std::__1::shared_ptr<arrow::internal::ThreadPool::State>, std::__1::__list_iterator<std::__1::thread, void*>) + 1040
frame #4: 0x0000000111350e54 libarrow.1800.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6::operator()() const + 88
frame #5: 0x0000000111350dc8 libarrow.1800.dylib`decltype(std::declval<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>()()) std::__1::__invoke[abi:ue170006]<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6&&) + 24
frame #6: 0x0000000111350da4 libarrow.1800.dylib`void std::__1::__thread_execute[abi:ue170006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>&, std::__1::__tuple_indices<>) + 28
frame #7: 0x0000000111350a1c libarrow.1800.dylib`void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>>(void*) + 84
frame #8: 0x0000000192171f94 libsystem_pthread.dylib`_pthread_start + 136
thread #8
frame #0: 0x00000001921345ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000019217255c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x0000000192097b14 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
frame #3: 0x000000011135129c libarrow.1800.dylib`arrow::internal::WorkerLoop(std::__1::shared_ptr<arrow::internal::ThreadPool::State>, std::__1::__list_iterator<std::__1::thread, void*>) + 1040
frame #4: 0x0000000111350e54 libarrow.1800.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6::operator()() const + 88
frame #5: 0x0000000111350dc8 libarrow.1800.dylib`decltype(std::declval<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>()()) std::__1::__invoke[abi:ue170006]<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6&&) + 24
frame #6: 0x0000000111350da4 libarrow.1800.dylib`void std::__1::__thread_execute[abi:ue170006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>&, std::__1::__tuple_indices<>) + 28
frame #7: 0x0000000111350a1c libarrow.1800.dylib`void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>>(void*) + 84
frame #8: 0x0000000192171f94 libsystem_pthread.dylib`_pthread_start + 136
thread #9
frame #0: 0x00000001921345ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000019217255c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x0000000192097b14 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
frame #3: 0x000000011135129c libarrow.1800.dylib`arrow::internal::WorkerLoop(std::__1::shared_ptr<arrow::internal::ThreadPool::State>, std::__1::__list_iterator<std::__1::thread, void*>) + 1040
frame #4: 0x0000000111350e54 libarrow.1800.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6::operator()() const + 88
frame #5: 0x0000000111350dc8 libarrow.1800.dylib`decltype(std::declval<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>()()) std::__1::__invoke[abi:ue170006]<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6&&) + 24
frame #6: 0x0000000111350da4 libarrow.1800.dylib`void std::__1::__thread_execute[abi:ue170006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>&, std::__1::__tuple_indices<>) + 28
frame #7: 0x0000000111350a1c libarrow.1800.dylib`void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>>(void*) + 84
frame #8: 0x0000000192171f94 libsystem_pthread.dylib`_pthread_start + 136
thread #10
frame #0: 0x00000001921345ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000019217255c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x0000000192097b14 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
frame #3: 0x000000011135129c libarrow.1800.dylib`arrow::internal::WorkerLoop(std::__1::shared_ptr<arrow::internal::ThreadPool::State>, std::__1::__list_iterator<std::__1::thread, void*>) + 1040
frame #4: 0x0000000111350e54 libarrow.1800.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6::operator()() const + 88
frame #5: 0x0000000111350dc8 libarrow.1800.dylib`decltype(std::declval<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>()()) std::__1::__invoke[abi:ue170006]<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6&&) + 24
frame #6: 0x0000000111350da4 libarrow.1800.dylib`void std::__1::__thread_execute[abi:ue170006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>&, std::__1::__tuple_indices<>) + 28
frame #7: 0x0000000111350a1c libarrow.1800.dylib`void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>>(void*) + 84
frame #8: 0x0000000192171f94 libsystem_pthread.dylib`_pthread_start + 136
thread #11
frame #0: 0x00000001921345ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000019217255c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x0000000192097b14 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
frame #3: 0x000000011135129c libarrow.1800.dylib`arrow::internal::WorkerLoop(std::__1::shared_ptr<arrow::internal::ThreadPool::State>, std::__1::__list_iterator<std::__1::thread, void*>) + 1040
frame #4: 0x0000000111350e54 libarrow.1800.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6::operator()() const + 88
frame #5: 0x0000000111350dc8 libarrow.1800.dylib`decltype(std::declval<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>()()) std::__1::__invoke[abi:ue170006]<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6&&) + 24
frame #6: 0x0000000111350da4 libarrow.1800.dylib`void std::__1::__thread_execute[abi:ue170006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>&, std::__1::__tuple_indices<>) + 28
frame #7: 0x0000000111350a1c libarrow.1800.dylib`void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_6>>(void*) + 84
frame #8: 0x0000000192171f94 libsystem_pthread.dylib`_pthread_start + 136
(lldb) This trace is sufficiently different from the previous one that I'm going to try to capture another trace later today. The earlier trace was in a different test and stuck on a different syscall. Will report back. |
I wasn't able to reproduce the hang and get a debugger attached after numerous attempts today. I can keep trying but I'm not sure what a good next step here would be. The process I've been taking has been to run a modified |
Thanks @amoeba . I think we're quite clear now that every hang that could be reproduced was simply stalled waiting for a network response, so it's probably not a bug in our S3 filesystem at least. We could perhaps try to disable the non-existent bucket tests on macOS to see if that makes our CI more reliable... |
Describe the bug, including details regarding any error messages, version, and platform.
The
arrow-s3fs-test
has been failing due to what appears to be a timeout on the "AMD64 macOS 12 C++" job for a while.A recent example is: https://github.com/apache/arrow/actions/runs/8179809257/job/22366589254?pr=40373#step:13:265 and the relevant output is:
gist for posterity.
I expect this causes a lot of noise for maintainers so figuring out why this test is timing out would be good.
Component(s)
C++, Continuous Integration
The text was updated successfully, but these errors were encountered: