Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: New Intermittent segfault #46152

Closed
Keno opened this issue Jul 24, 2022 · 7 comments
Closed

CI: New Intermittent segfault #46152

Keno opened this issue Jul 24, 2022 · 7 comments
Labels
ci Continuous integration

Comments

@Keno
Copy link
Member

Keno commented Jul 24, 2022

We appear to have a new intermittent segfault on CI. I regularly see ProcessExited(139) on macos builders, but it's possible that failures on various other systems are also related. See below for a table of recent macos logs that failed with this error. Looks like this might have started around Jul 4 or so.

Table of recent ProcessExited(139) failures

12 rows × 9 columns

agentcommitdateelapsedkeylog_urlstateuuidweb_url
StringStringDateTimeFloat64StringStringStringUUIDString
1default-macmini-x64-5.0743578a2022-07-22T05:11:28.0084911.28test_x86_64-apple-darwinhttps://api.buildkite.com/v2/organizations/julialang/pipelines/julia-master/builds/14170/jobs/01822451-f8cd-43a7-a4b4-83d37d4236e4/log.txtfailed01822451-f8cd-43a7-a4b4-83d37d4236e4https://buildkite.com/julialang/julia-master/builds/14170#01822451-f8cd-43a7-a4b4-83d37d4236e4
2default-macmini-x64-5.0028e9ff2022-07-20T15:15:51.0248267.88test_x86_64-apple-darwinhttps://api.buildkite.com/v2/organizations/julialang/pipelines/julia-master/builds/14094/jobs/01821c2e-dd00-4443-89d2-a1f346cf8bda/log.txtfailed01821c2e-dd00-4443-89d2-a1f346cf8bdahttps://buildkite.com/julialang/julia-master/builds/14094#01821c2e-dd00-4443-89d2-a1f346cf8bda
3default-macmini-aarch64-1.0e96b19d2022-07-14T18:22:37.2454810.07test_aarch64-apple-darwinhttps://api.buildkite.com/v2/organizations/julialang/pipelines/julia-master/builds/13912/jobs/0181fdf3-8b9c-4341-b2a6-2301ec77ce8f/log.txtfailed0181fdf3-8b9c-4341-b2a6-2301ec77ce8fhttps://buildkite.com/julialang/julia-master/builds/13912#0181fdf3-8b9c-4341-b2a6-2301ec77ce8f
4default-macmini-aarch64-1.09815a8e2022-07-13T15:43:51.3533703.01test_aarch64-apple-darwinhttps://api.buildkite.com/v2/organizations/julialang/pipelines/julia-master/builds/13836/jobs/0181f95e-1f7d-420f-b0c6-86a38d2872ad/log.txtfailed0181f95e-1f7d-420f-b0c6-86a38d2872adhttps://buildkite.com/julialang/julia-master/builds/13836#0181f95e-1f7d-420f-b0c6-86a38d2872ad
5default-macmini-aarch64-1.0201d4f62022-07-13T09:02:25.1483797.28test_aarch64-apple-darwinhttps://api.buildkite.com/v2/organizations/julialang/pipelines/julia-master/builds/13824/jobs/0181f6cc-1f9e-40df-98e4-9858799342a9/log.txtfailed0181f6cc-1f9e-40df-98e4-9858799342a9https://buildkite.com/julialang/julia-master/builds/13824#0181f6cc-1f9e-40df-98e4-9858799342a9
6default-macmini-x64-2.0201d4f62022-07-13T09:02:25.1485804.43test_x86_64-apple-darwinhttps://api.buildkite.com/v2/organizations/julialang/pipelines/julia-master/builds/13824/jobs/0181f6cc-1dd9-4c19-a21b-8c1115c7933a/log.txtfailed0181f6cc-1dd9-4c19-a21b-8c1115c7933ahttps://buildkite.com/julialang/julia-master/builds/13824#0181f6cc-1dd9-4c19-a21b-8c1115c7933a
7default-macmini-x64-2.0c1d21e12022-07-13T07:32:31.1745450.19test_x86_64-apple-darwinhttps://api.buildkite.com/v2/organizations/julialang/pipelines/julia-master/builds/13821/jobs/0181f679-cd57-41c5-b95e-853529344ad5/log.txtfailed0181f679-cd57-41c5-b95e-853529344ad5https://buildkite.com/julialang/julia-master/builds/13821#0181f679-cd57-41c5-b95e-853529344ad5
8default-macmini-aarch64-2.087558f62022-07-08T19:40:26.4574463.86test_aarch64-apple-darwinhttps://api.buildkite.com/v2/organizations/julialang/pipelines/julia-master/builds/13634/jobs/0181df54-7655-4fcd-97ec-5df1b9421462/log.txtfailed0181df54-7655-4fcd-97ec-5df1b9421462https://buildkite.com/julialang/julia-master/builds/13634#0181df54-7655-4fcd-97ec-5df1b9421462
9default-macmini-aarch64-1.0da13d782022-07-07T08:37:50.3942840.45test_aarch64-apple-darwinhttps://api.buildkite.com/v2/organizations/julialang/pipelines/julia-master/builds/13599/jobs/0181d7cf-7da6-4f17-8de0-b0a49c6ff550/log.txtfailed0181d7cf-7da6-4f17-8de0-b0a49c6ff550https://buildkite.com/julialang/julia-master/builds/13599#0181d7cf-7da6-4f17-8de0-b0a49c6ff550
10default-macmini-aarch64-3.04d50ff82022-07-06T12:35:07.4406386.15test_aarch64-apple-darwinhttps://api.buildkite.com/v2/organizations/julialang/pipelines/julia-master/builds/13567/jobs/0181d385-141c-436f-bfb9-4eb9af26a3d4/log.txtfailed0181d385-141c-436f-bfb9-4eb9af26a3d4https://buildkite.com/julialang/julia-master/builds/13567#0181d385-141c-436f-bfb9-4eb9af26a3d4
11default-macmini-aarch64-4.0438bde42022-07-04T19:45:11.1105068.01test_aarch64-apple-darwinhttps://api.buildkite.com/v2/organizations/julialang/pipelines/julia-master/builds/13491/jobs/0181cac1-34d3-4c25-9a6c-c7a7d39352fc/log.txtfailed0181cac1-34d3-4c25-9a6c-c7a7d39352fchttps://buildkite.com/julialang/julia-master/builds/13491#0181cac1-34d3-4c25-9a6c-c7a7d39352fc
12default-macmini-aarch64-5.0fec69512022-06-30T17:21:33.7594715.9test_aarch64-apple-darwinhttps://api.buildkite.com/v2/organizations/julialang/pipelines/julia-master/builds/13360/jobs/0181b5a2-791b-4ec9-a4fe-6c7aec87d849/log.txtfailed0181b5a2-791b-4ec9-a4fe-6c7aec87d849https://buildkite.com/julialang/julia-master/builds/13360#0181b5a2-791b-4ec9-a4fe-6c7aec87d849
@Keno Keno added the ci Continuous integration label Jul 24, 2022
@Keno
Copy link
Member Author

Keno commented Jul 24, 2022

The commit before that first failure was the merge of #45861. @vtjnash any chance that's related?

Actually I missed a previous failure on Jun 30.

@vtjnash
Copy link
Member

vtjnash commented Jul 24, 2022

Is that the atexit hook test failing? I think someone did change that recently (#45765). I think that test pattern might be still expected to fail sometimes if you are running if threads are forced on, so can we please stop doing that in CI?

@giordano
Copy link
Contributor

I think someone did change that recently (#45765).

That'd match the the fact the first failing job was found on June 30th: the PR was merged the day before.

@Keno
Copy link
Member Author

Keno commented Jul 24, 2022

The test that's failing is not the one that was added in that PR though. It's usually either the sysimg-code-native=no test or one of the LibGit2 tests.

Keno added a commit that referenced this issue Jul 24, 2022
This changes the mach exception server to ignore fatal SIGSEGVs,
letting regular kernel processing handle it (by performing POSIX
signal delivery and then subsequently coredumping), rather than
quitting the process directly. There's probably some way to
induce the kernel to perform core dumping directly from the
exception server, but I think it'll be less confusing all around
to just have segfaults take the standard path.

Hoping this will help debug #46152.
@vtjnash
Copy link
Member

vtjnash commented Jul 25, 2022

Both would be very likely to fail if the PR was incompletely fixing the issue, and if people keep configuring CI to run with threads (which is not supported yet)

@DilumAluthge
Copy link
Member

and if people keep configuring CI to run with threads (which is not supported yet)

JuliaCI/julia-buildkite#185

Keno added a commit that referenced this issue Jul 25, 2022
This changes the mach exception server to ignore fatal SIGSEGVs,
letting regular kernel processing handle it (by performing POSIX
signal delivery and then subsequently coredumping), rather than
quitting the process directly. There's probably some way to
induce the kernel to perform core dumping directly from the
exception server, but I think it'll be less confusing all around
to just have segfaults take the standard path.

Hoping this will help debug #46152.
@Keno
Copy link
Member Author

Keno commented Jul 26, 2022

Appears to have been resolved by turning off threading.

@Keno Keno closed this as completed Jul 26, 2022
ffucci pushed a commit to ffucci/julia that referenced this issue Aug 11, 2022
This changes the mach exception server to ignore fatal SIGSEGVs,
letting regular kernel processing handle it (by performing POSIX
signal delivery and then subsequently coredumping), rather than
quitting the process directly. There's probably some way to
induce the kernel to perform core dumping directly from the
exception server, but I think it'll be less confusing all around
to just have segfaults take the standard path.

Hoping this will help debug JuliaLang#46152.
pcjentsch pushed a commit to pcjentsch/julia that referenced this issue Aug 18, 2022
This changes the mach exception server to ignore fatal SIGSEGVs,
letting regular kernel processing handle it (by performing POSIX
signal delivery and then subsequently coredumping), rather than
quitting the process directly. There's probably some way to
induce the kernel to perform core dumping directly from the
exception server, but I think it'll be less confusing all around
to just have segfaults take the standard path.

Hoping this will help debug JuliaLang#46152.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration
Projects
None yet
Development

No branches or pull requests

4 participants