-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault in jl_collect_backedges #45444
Comments
I've seen this one also, but haven't been able to reproduce. |
Maybe an ASAN trace instead? https://buildkite.com/julialang/julia-master/builds/12051#5db78853-4ee6-4fe1-81e2-cbd26ff71465
|
Possibly a simple fix would be to wait to turn on GC Line 3184 in b3b229e
validate_new_code_instances (just a couple lines down).
|
Looking at latest daily PkgEval, there's a bunch of packages triggering this:
The lightest package here seems FranklinUtils, I'll try running that a bunch to try and get an |
I'm assigning myself to fix this, but happy to accept an |
I couldn't reproduce this locally, so I hacked Link to an rr trace will be appended at the end of the log of packages that abort/segfault/reach an unreachable/report GC corruption. Example here: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/e1739aa/report.html (although I let it unconditionally submit the log here). I do expect some false positives because of running under rr though, so I'll revert this soon after getting a (hopefully useful) report here. e1739aa#commitcomment-78790001= EDIT: well, that was underwhelming. There's some work to be done in BugReporting.jl before this will work, so I might not have an rr trace before tomorrow. In any case, this would be a useful addition to PkgEval, so I'll have a look anyway. EDIT2: A new attempt successfully recorded all critical package failures, but of course there wasn't a jl_collect_backedges segfault among them. I'll try once more when I get the rr recording functionality merged. |
In a debug build I hit #46064, the InteractiveUtils case: Expr not allowed in value position
Internal error: encountered unexpected error in runtime:
ErrorException("")
error at ./error.jl:35
check_op at ./compiler/ssair/verify.jl:52
verify_ir at ./compiler/ssair/verify.jl:243
verify_ir at ./compiler/ssair/verify.jl:79 [inlined]
run_passes at ./compiler/optimize.jl:590
... |
I've not had any luck capturing this locally either, even in a debug build with many more assertions turned on. Given the line number I'm nevertheless about 60% hopeful that #46148 will fix it. |
The edge-restore algorithm here is pretty bad now, but this should hopefully fix #45444
The edge-restore algorithm here is pretty bad now, but this should hopefully fix #45444
I don't think this was fixed, as it occurred on a recent PkgEval run: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2022-08/03/BasicBSpline.primary.log (this was on eedf3f1). Sadly, due to a bug in PkgEval.jl we didn't upload the rr recording... |
Finally caught this in rr: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/rr/UrlDownload-1659617146.tar.zst. If you want to replay this using BugReporting.jl, this brings you right to the crash: |
That run failed to include #46171 |
Ugh, I assumed it had been back-ported already. Let's assume it's still fixed then. |
This still happens, now spotted on 686afd3 which definitely includes the fix. PkgEval log: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2022-08/09/report.html, relevant package log: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2022-08/09/ComoniconGUI.primary.log
rr recording: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/rr/ComoniconGUI-1660060597.tar.zst To get to the segfault: using BugReporting
replay("https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/rr/ComoniconGUI-1660060597.tar.zst"; rr_replay_flags=`--onprocess 21 --goto 194177`)
|
Latest PkgEval had 3 instances of this bug:
All with an rr trace at the bottom of the log. |
This comment was marked as outdated.
This comment was marked as outdated.
The edge-restore algorithm here is pretty bad now, but this should hopefully fix JuliaLang#45444
Failure caught with a debug build:
|
The edge-restore algorithm here is pretty bad now, but this should hopefully fix JuliaLang#45444
Seen in PkgEval in #45195 on both sides of the comparison.
https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/39a24eb_vs_5554676/PANDA.against.log
The text was updated successfully, but these errors were encountered: