-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in 370 mode IPL #538
Comments
I'm getting nowhere. I can't seem to cause a core file to get created even though my
So the core file is never getting created and I have no idea why. I tried running Herc from gdb instead, but I can't even get that to work either! It can't seem to even start it. It keeps saying: (gdb) run -f ${HOME}/hercules/HERC/CFG/fish-dosvs-550-hyperion-4.0.txt -o ${HOME}/hercules.log
Starting program: /home/fish/hercules/hercules-0/.libs/hercules -f ${HOME}/hercules/HERC/CFG/fish-dosvs-550-hyperion-4.0.txt -o ${HOME}/hercules.log
/home/fish/hercules/hercules-0/.libs/hercules: error while loading shared libraries: libherc.so: cannot open shared object file: No such file or directory
[Inferior 1 (process 2342) exited with code 0177]
(gdb) This happens no matter what directory I start gdb from. I give up. My Linux skills are simply too lacking. |
I do more tomorrow. |
Fish, those commands also work to enable core files on Kubuntu. Here's the output, on Kubuntu, with
|
OK, I've found the change that makes it fail. Why, I don't know yet. 52b5df6#diff-8c254e4d1ce0ab4fa655b87eadddfb19246ac185d0336b659c22702d9dd95a8f The changes in
|
That evaluates to the same value, though: |
Yeah, you'd think so. And it looks right in Compiler Explorer I suspect the problem is when it's applied later with an Specifically it is
|
Works correctly when compiled with Clang 13! (yesterday I was thinking, I bet Fish will get to exercise his dislike of gcc before we're done with this one) |
Do you really blame me? Fucks up with one version but works fine with a later one? What would you conclude? Obvious compiler bug. Now I'm not saying other compilers don't also occasionally introduce bugs of their own, Microsoft's included. No. They obviously do. Humans are fallible creatures after all, and programming can be quite challenging at times. Especially for compiler writers. But it just seems to me that we seem to stumble across these types of bugs WAY more frequently with gcc than we do with Microsoft's product, thereby forcing me to ask the obvious question "Why is that?". And the only logical conclusion I can come to in my mind is that the gcc programmers are either: a) incompetent, or b) working with a poorly designed/written product, thereby causing otherwise competent and skilled programmers to fuck up much more frequently than otherwise (to our unfortunate detriment). Linux is generally a fine product. Superior to Microsoft's in many ways. (but not so much in others and vice-versa) But their compiler? It truly sucks. Oh sure! It's fine when it works! No question! But it seems to break way more frequently than it should IMO. WAY more. And that should be cause for concern. (And unlike Microsoft's, it's not exactly user friendly either IMHO) But enough of that... Do you want to do the honors and commit the fix, Bill, since you were the one that found/identified it after all (which I presume would be to simply change that one #define back to its original |
Fish,
No, can't blame you.
I'll do it. But I want to experiment a bit more first. Bill |
Hmmm...I raised the question to a friend, and he argues that, under the C
standard's strict definition, the expression should be
(~((unsigned) 0) << SHIFT_4K) ... the reasoning being that ~ applied to a
signed zero could have implementation defined effects that might be
surprising.
Might be worth a try.
…On Thu, Jan 19, 2023 at 3:01 PM Bill Lewis ***@***.***> wrote:
Fish,
Do you really blame me? Fucks up with one version but works fine with a
later one? What would you conclude? Obvious compiler bug.
No, can't blame you.
Do you want to do the honors and commit the fix, Bill, since you were the
one that found/identified it after all (which I presume would be to simply
change that one #\define back to its original 0x000007FF value?) Or would
you like me to commit it since it was after all caused by my commit? Or do
you, Jay, want to commit the fix, since it was you that discovered the bug?
I'll do it. But I want to experiment a bit more first.
Bill
—
Reply to this email directly, view it on GitHub
<#538 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKSM54PJDM5MEOK4GJ3E6LWTGTSVANCNFSM6AAAAAAT7YOPGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Jay Maynard
|
Yeah, I wondered about that too. |
Or, for completeness...
#define STORAGE_4K_BYTEMASK (~(~((unsigned) 0) << SHIFT_4K))
…On Thu, Jan 19, 2023 at 3:12 PM Jay Maynard ***@***.***> wrote:
Hmmm...I raised the question to a friend, and he argues that, under the C
standard's strict definition, the expression should be
(~((unsigned) 0) << SHIFT_4K) ... the reasoning being that ~ applied to a
signed zero could have implementation defined effects that might be
surprising.
Might be worth a try.
On Thu, Jan 19, 2023 at 3:01 PM Bill Lewis ***@***.***>
wrote:
> Fish,
>
> Do you really blame me? Fucks up with one version but works fine with a
> later one? What would you conclude? Obvious compiler bug.
>
> No, can't blame you.
>
> Do you want to do the honors and commit the fix, Bill, since you were the
> one that found/identified it after all (which I presume would be to simply
> change that one #\define back to its original 0x000007FF value?) Or would
> you like me to commit it since it was after all caused by my commit? Or do
> you, Jay, want to commit the fix, since it was you that discovered the bug?
>
> I'll do it. But I want to experiment a bit more first.
>
> Bill
>
> —
> Reply to this email directly, view it on GitHub
> <#538 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAKSM54PJDM5MEOK4GJ3E6LWTGTSVANCNFSM6AAAAAAT7YOPGU>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
--
Jay Maynard
--
Jay Maynard
|
Interesting! So basically: - #define STORAGE_KEY_2K_BYTEMASK (~(((unsigned)(~0)) << SHIFT_2K))
+ #define STORAGE_KEY_2K_BYTEMASK (~((~(unsigned) 0 ) << SHIFT_2K))
- #define STORAGE_KEY_4K_BYTEMASK (~(((unsigned)(~0)) << SHIFT_4K))
+ #define STORAGE_KEY_4K_BYTEMASK (~((~(unsigned) 0 ) << SHIFT_4K)) Yes? |
Doesn't fix it. |
Yup. Or even ~0u instead of ~(unsigned)0.
…On Thu, Jan 19, 2023 at 4:04 PM Fish-Git ***@***.***> wrote:
Interesting! So basically:
- #define STORAGE_KEY_2K_BYTEMASK (~(((unsigned)(~0)) << SHIFT_2K))
+ #define STORAGE_KEY_2K_BYTEMASK (~((~(unsigned) 0 ) << SHIFT_2K))
- #define STORAGE_KEY_4K_BYTEMASK (~(((unsigned)(~0)) << SHIFT_4K))
+ #define STORAGE_KEY_4K_BYTEMASK (~((~(unsigned) 0 ) << SHIFT_4K))
Yes?
—
Reply to this email directly, view it on GitHub
<#538 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKSM52KLKCOIQKG4ANFUODWTG27BANCNFSM6AAAAAAT7YOPGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Jay Maynard
|
You beat me to it! I was just going to try it. I guess I won't now. So (gcc(?) clang(?) which one?) screws up either way? Yes? So we're forced to use 0x000007ff? Yes? |
I'd suggest ~0U, and if that doesn't fix it, then shrug and use 0x7ff.
…On Thu, Jan 19, 2023 at 4:09 PM Fish-Git ***@***.***> wrote:
Doesn't fix it.
You beat me to it! I was just going to try it. I guess I won't now.
So (gcc(?) clang(?) which one?) screws up either way? Yes? So we're forced
to use 0x000007ff? Yes?
—
Reply to this email directly, view it on GitHub
<#538 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKSM52J5M5VIRYO2E4USHTWTG3RLANCNFSM6AAAAAAT7YOPGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Jay Maynard
|
That doesn't work either. (SIGH!!) So I guess we're stuck with 7ff. |
Fish,
Tested gcc 11.3.0 and 12.1.0. Both fick up, with everything other than the simply constant. Clang 13.0.1 works with the original Bill |
There is a way with git to update the commit that introduces the problem. But it's tricky and I don't want to risk wrecking things. So I will just check in a new commit at the head, and we'll accept that a small range of commits, in between, don't work. OK? |
That's the way development on stuff like this usually goes. Don't sweat it.
…On Fri, Jan 20, 2023 at 9:54 AM Bill Lewis ***@***.***> wrote:
There is a way with git to update the commit that introduces the problem.
But it's tricky and I don't want to risk wrecking things. So I will just
check in a new commit at the head, and we'll accept that a small range of
commits, in between, don't work.
OK?
—
Reply to this email directly, view it on GitHub
<#538 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKSM5YZ3FB2IO6G6L5TIBDWTKYM7ANCNFSM6AAAAAAT7YOPGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Jay Maynard
|
Bill said:
Please don't! Bill said:
Yes please! Just do a normal commit to HEAD, just like always, just like you'd do any other commit. Undoing an accidental commit to HEAD like what occurred when Jay's As Jay said:
Quite right. This is the development branch after all. It's expected that things like this will occasionally happen. When it does, we just commit the fix for it and move on. It's wholly unimportant that a small range of commits "breaks" Hercules. The only thing that's important is the end result. Only the tip of the branch is important (i.e. the current state of the branch), not any previous state. The current (most recent) commit should (ideally) build cleanly and function correctly. If it doesn't and we somehow miss that (like we did with this current GitHub Issue), we simply commit the fix for it and move on. It has happened countless times before in the history of Hercules development and will likely happen again eventually. That's just the way software development works. So PLEASE DON'T try to do whatever it was you were thinking of maybe doing, and just do a normal commit. It's safer and will fix the problem, which is all we're really interested in. Thanks. |
OK I'm testing now on various other systems. Just an update... The failing code also fails on Clang 13.1.6 on Mac M1 CPU. The correction works. I suspect we've just fallen afoul of C undefined behavior. And god knows where. |
If you're curious however and want to experiment on whether what you were thinking of doing would work or not, feel free to give it a try, BUT NOT ON OUR LIVE RESPOSITORY! We actually have two test (dummy) repositories for performing git experiments with: (see https://github.com/SDL-Hercules-390). These test repositories were expressly created for such experimenting. If you screw up and the repository gets fucked up, it's no big deal. We can always recreate them. But NEVER do any experimenting on (anything risky to) our current production repository! Cool? |
Of course. |
Changes in commit 52b5df6 exacerbated a compiler bug and/or C undefined behavior.
Hopefully addressed with commit 1159891 Tested on:
|
Confirmed on Kubuntu 21.10 with:
Thank you Bill! Closing GitHub Issue. |
This was discovered while trying to build and run the develop branch after #537 (comment) added the ADDFRR instruction, though subsequent testing determined it actually was introduced earlier.
Symptom: Attempting to IPL any system with
ARCHMODE S/370
immediately segfaults. This only happens on Linux; Windows appears unaffected.The failure appears in the run CPU loop. More information can be found in the pull request comments.
The text was updated successfully, but these errors were encountered: