Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log format during shutdown (Issue 480) #634

Merged
merged 3 commits into from
Feb 24, 2024

Conversation

JamesWekel
Copy link
Contributor

@JamesWekel JamesWekel commented Feb 23, 2024

Here is a 2nd solution to fix two log format issues during shutdown Issue 480:

  1. duplicate log lines on non-Windows systems.

Issue related to logger processing and timing of shutdown logger_unredirect(). To fix the timing issue, a system shutbegin flag notifies logger_thread() that shutdown is beginning and to execute logger_unredirect() before setting system shutdown flag. Timing is now controlled by logger_thread().

  1. interleaved log lines with two time stamps or missing a timestamp.

Update logger_timestamped_logfile_write to have a lock if timestamps are required.

Tested using MVS TK5 startup/shutdown on Armbain Jammy (Ubuntu) on Raspberry Pi4 and Intel Nuc, and on Windows (Intel Nuc). Missing log lines (e.g. 'hao_thread' ended) from previous 'bad' solution are included in the log. Not sure how to explicitly test problem #2. My test runs do not show any interleaved log lines but that doesn't imply the problem is resolved.

Note: this solution does not make any attempt to resolve missing shutdown messages on the console.

I would appreciate further testing on other systems and any comments on the solution.

Thanks,
Jim

Copy link
Member

@Fish-Git Fish-Git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ONE minor glitch...

In the do_shutdown_now() function, you need to move your declaration for spincount up to the top where your loggersetshutdown is being declared.

Yes, I know most modern versions of 'C' these days now allow you to declare variables anywhere before you reference them, but older compiler versions do not. They require them to all be declared at the beginning of the function.

Other than that ONE TINY THING, everything else looks fine to me!

Make that one change and I'll be happy to accept your Pull Request!

Thank you for your continued contributions, James!  :)

@JamesWekel
Copy link
Contributor Author

JamesWekel commented Feb 24, 2024

Fish,

Not a problem. I welcome the review and comments. Change completed.

Jim

Copy link
Member

@Fish-Git Fish-Git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@Fish-Git Fish-Git merged commit d1a6995 into SDL-Hercules-390:develop Feb 24, 2024
@Fish-Git
Copy link
Member

Merged!

Thanks again, James!

@wrljet
Copy link
Member

wrljet commented Feb 24, 2024

We do still need to fully test it out on other systems, non-Linux, BSDs, macOS, etc.
I'll do some of them randomly.

@Fish-Git
Copy link
Member

We do still need to fully test it out on other systems, non-Linux, BSDs, macOS, etc.
I'll do some of them randomly.

Damn! I keep forgetting about that. I'll give it a try on my Kubuntu 21.10 system too.

@Fish-Git
Copy link
Member

FWIW, it seems to work just fine on both my Kubuntu 21.10 and my old CentOS 6.10 system too, both virtual machines running under VMware WorkStation (not that that makes any difference of course).

Should we close GitHub Issue #480 now?

@wrljet
Copy link
Member

wrljet commented Feb 24, 2024

I'd like to try it first on BSD and macOS before closing.

@wrljet
Copy link
Member

wrljet commented Feb 25, 2024

Works on macOS (Sonoma 14.4 Beta).

I'll try BSD next.

@Fish-Git
Copy link
Member

Well? Is BSD okay?

I'd like to close issue #480!

@wrljet
Copy link
Member

wrljet commented Mar 1, 2024

Fish,

So just tried things out with the latest commit, on FreeBSD 14, and get endless errors in 'make check':

I then rebuilt on Debian, and getting the same there.

The very recent change for that processor stopped message is confusing/breaking the tests?

For example:

Variable $cmpxchg8           set to "1"
Variable $cmpxchg16          set to "1"
Variable $hatomics           set to "C11"
>>>>> line   153: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "1403 printer Load UCS Buffer":  1 OK compares.  One failure.
SKIPPING: Testcase 3211 printer
REASON:   No Hercules Rexx support.
>>>>> line  1255: Received unexpected wait state:  0002000000000000 0000000000000000; processor stopped
>>>>> line  1258: Received unexpected wait state:  0002000000000000 0000000000000000; processor stopped
Test "CBUC (Concurrent Block Update Consistency)":  0 OK compares.  2 failures.
>>>>> line  1317: Received unexpected wait state:  0002000000000000 0000000000000000; processor stopped
Test "CCW-ILS (CCW Incorrect Length Suppression)":  1 OK compares.  One failure.
>>>>> line  1359: Received unexpected wait state:  0002000000000000 0000000000000000; processor stopped
>>>>> line  1360: Received unexpected wait state:  0002000000000000 0000000000000000; processor stopped
Test "cmpxchg16 as used by CDSG, STPQ and LPQ instructions":  0 OK compares.  2 failures.
>>>>> line  1401: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "CLCL-et-al (Test CLCL, MVCIN and TRT instructions)":  0 OK compares.  One failure.
>>>>> line  1424: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "CLCL Unaligned Buffers Test":  0 OK compares.  One failure.
>>>>> line  1448: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "CLCLE-01-unaligned-buffers Test":  0 OK compares.  One failure.
>>>>> line  1472: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "CLCLE-02-unaligned-buffers Test":  0 OK compares.  One failure.
>>>>> line  1496: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "CLCLE-03-basic (Test CLCLE instructions)":  0 OK compares.  One failure.
>>>>> line  1520: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "CLCLE-04-performance (Test CLCLE instructions)":  0 OK compares.  One failure.
>>>>> line  1546: Received unexpected wait state:  0002000000000000 0000000000000000; processor stopped
Test "CMPSC (Compression Call)":  0 OK compares.  One failure.
>>>>> line  1572: Received unexpected wait state:  0002000180000000 0000000000000000; processor stopped
Test "CU12-01-xpage (Test CU12 instruction)":  0 OK compares.  One failure.
>>>>> line  1599: Received unexpected wait state:  0002000180000000 0000000000000000; processor stopped

...

Test "multix1":  1 OK compares.  One failure.
>>>>> line 25399: Received unexpected wait state:  0002000180000000 0000000000000000; processor stopped
Test "zeos":  5 OK compares.  One failure.
Did 240 tests.  233 failed; 7 OK.
gmake: *** [Makefile:3602: check] Error 233

Bill

@wrljet
Copy link
Member

wrljet commented Mar 1, 2024

As far as this issue with the log output during shutdown is concerned, though, the changes to seem to work correctly on FreeBSD.

So I'd say we're good to go.

Bill

@JamesWekel
Copy link
Contributor Author

Bill: Thank you for all the testing, especially pointing out the problems with the first fix version.

Fish: thank you for the extra tests and reviewing the pull request.

Much appreciate all your support.

Jim

@Fish-Git
Copy link
Member

Fish-Git commented Mar 1, 2024

The very recent change for that processor stopped message is confusing/breaking the tests?

Yep.  It sure as hell is.

(F**K!)

I jumped right on it and fixed it as soon as I got up this morning and saw your message, but also noticed another unrelated problem too that I introduced recently: the E7Prefix "Various CKD dasd I/O tests" test keeps failing because its shadow file isn't being deleted when the test finishes.  :(

If you manually delete all shadow files in the tests directory before running your 'runtest', it works fine. But if you immediately try it again, it fails because the old shadow file is still there when the test starts even though its .tst script clearly said to discard it when the test was over!  (sf-a80 nomerge. The nomerge should be deleting the shadow file but for reasons unknown it's not.)

I already have the fix for the "processor stopped" SNAFU ready to go and it does indeed correct the cause for all of the other 230+ failures. I'll go ahead and commit it ASAP.

BUT...   be aware that the other bug -- the 'sf' command's failure to delete the file when nomerge is specified -- will still remain, causing that one test to continue to fail whenever the old (previous) shadow file is still around when the test runs. So until I can find and fix it, just be sure no shadow files exist in the tests directory before running your tests and all tests should pass.

@Fish-Git
Copy link
Member

Fish-Git commented Mar 1, 2024

As far as this issue with the log output during shutdown is concerned, though, the changes to seem to work correctly on FreeBSD.

So I'd say we're good to go.

Good. I'll go ahead and close issue #480 then. Thanks.

@Fish-Git
Copy link
Member

Fish-Git commented Mar 1, 2024

BUT...   be aware that the other bug -- the 'sf' command's failure to delete the file when nomerge is specified -- will still remain, causing that one test to continue to fail whenever the old (previous) shadow file is still around when the test runs. So until I can find and fix it, just be sure no shadow files exist in the tests directory before running your tests and all tests should pass.

Fixed!

@wrljet
Copy link
Member

wrljet commented Mar 1, 2024

The very recent change for that processor stopped message is confusing/breaking the tests?

Yep. It sure as hell is.

(FK!)**

I jumped right on it and fixed it as soon as I got up this morning and saw your message, but also noticed another unrelated problem too that I introduced recently: the E7Prefix "Various CKD dasd I/O tests" test keeps failing because its shadow file isn't being deleted when the test finishes. :(

If you manually delete all shadow files in the tests directory before running your 'runtest', it works fine. But if you immediately try it again, it fails because the old shadow file is still there when the test starts even though its .tst script clearly said to discard it when the test was over! (sf-a80 nomerge. The nomerge should be deleting the shadow file but for reasons unknown it's not.)

I already have the fix for the "processor stopped" SNAFU ready to go and it does indeed correct the cause for all of the other 230+ failures. I'll go ahead and commit it ASAP.

BUT... be aware that the other bug -- the 'sf' command's failure to delete the file when nomerge is specified -- will still remain, causing that one test to continue to fail whenever the old (previous) shadow file is still around when the test runs. So until I can find and fix it, just be sure no shadow files exist in the tests directory before running your tests and all tests should pass.

Both issues tested and verifiied are fixed as of commit: 140a0cf

Tests work, and are "serially reusable".

@wrljet
Copy link
Member

wrljet commented Mar 1, 2024

Thank you, James, for your work and contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants