Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10 seconds intermittent delay with multiple paths to a DASD device #624

Closed
SE20225 opened this issue Feb 12, 2024 · 5 comments
Closed

10 seconds intermittent delay with multiple paths to a DASD device #624

SE20225 opened this issue Feb 12, 2024 · 5 comments
Assignees
Labels
BUG The issue describes likely incorrect product functionality that likely needs corrected.

Comments

@SE20225
Copy link

SE20225 commented Feb 12, 2024

I have an interest in the 370 style multiprocessors running MVS (now TK5 with an IOGEN to reduce the number of UCBs, add up to 4 paths to DASD (the MVS/370 maximum, two from each processor) and bring in the CRH-function, which is of special interest although it is not involved in the 'problem' discussed here. (CRH is to provide function to handle I/O which is only physically connected to the processor that was 'lost' due to a HW problem.)

On the initial attempts it was suggested that I achieve multiple paths by specifying like:

029F 3390 dasd/tk5res.390
1:029F 3390 localhost::029F

but this caused some intermittent malfunctions. Instead I now specify like:

0F9F 3390 dasd/tk5res.390
0:029F 3390 localhost::0F9F
1:029F 3390 localhost::0F9F

where the F9F device is not used from the MVS system. The malfunctions (false EQCs and other) are gone, but instead one can observe a few 10 sec delays.

The first unexpected delay occurs already while the config is created is 10 seconds per additional path established. I do not know how or what to trace at this time and there is no I/O from the virtual machine at this time.

In the book I found the command msglevel +dasd +channel, but they generate an error message.

Occasionally there is also a 10 seconds delay when the system is up and running and all cases I have looked into occur when MVS happens to swtich from CPU 0 to CPU1 (or the other way around) for the next I/O to the traced device (which is the syssres volume). PURGE (cached tracks in the DASD sharing scheme) processing is always active at the time. But when scanning the trace there are also occasions of switching to the other CPU with no delays.

I also noted that 10 seconds appear in the config file and changed TIMERINT to 5000 or cckd GCINT to 5 but these 10 seconds delay remained unchanged.

Could this be a tuning problem? With cache etcetera?

The delay during machine establishment is 16:23:26 to 16:23:36 and the traced delay when the TSO user is logged on is from 16:28:24 to 16:28:33. The delay is clearly noticeable for the TSO user and sometimes console commands are delayed. At 16:25:48 is an example of a quick switch to the other CPU.

The documentation provided in the below DELAY.zip attachment is:

  • The config file
  • The Hercules log
  • The MVS log

The delay during configuration processing should be easy to recreate since no host code is (yet) involved. It might have a different cause though!

Anders Edlund andersedlund@telia.com

@SDL-Hercules-390 SDL-Hercules-390 deleted a comment from SE20225 Feb 12, 2024
@Fish-Git Fish-Git added BUG The issue describes likely incorrect product functionality that likely needs corrected. Researching... The issue is being looked into or additional information is being gathered/located. labels Feb 12, 2024
@Fish-Git
Copy link
Member

Instead I now specify like:

0F9F 3390 dasd/tk5res.390
0:029F 3390 localhost::0F9F
1:029F 3390 localhost::0F9F

where the F9F device is not used from the MVS system. The malfunctions (false EQCs and other) are gone, but instead one can observe a few 10 sec delays.

The first unexpected delay occurs already while the config is created is 10 seconds per additional path established. I do not know how or what to trace at this time and there is no I/O from the virtual machine at this time.

The delay during configuration processing should be easy to recreate since no host code is (yet) involved.

I will try to reproduce the startup delay that occurs when Hercules is first started but before any IPL, since that should be pretty easy to reproduce since all of my test dasds can be dummy/empty volumes since I won't be IPLing anything.

@Fish-Git Fish-Git self-assigned this Feb 12, 2024
@Fish-Git
Copy link
Member

GOOD NEWS!

I was able to reproduce the 10-second delay problem and I have a fix for it!

I will be committing it within the next day or so. Possibly later tonight.

@Fish-Git Fish-Git added IN PROGRESS... I'm working on it! (Or someone else is!) and removed Researching... The issue is being looked into or additional information is being gathered/located. labels Feb 13, 2024
Fish-Git added a commit that referenced this issue Feb 14, 2024
Reduce 'select' timeouts to 50 milliseconds.

Closes GitHub Issue #624.
@Fish-Git
Copy link
Member

Fixed by commit 79a1488.

Closing.

@Fish-Git Fish-Git removed the IN PROGRESS... I'm working on it! (Or someone else is!) label Feb 14, 2024
@SE20225
Copy link
Author

SE20225 commented Feb 15, 2024

I found the changes you had made on the git system and manually made the same changes to my source and then rebuilt Hercules.

Both reported problems are now gone!

I guess the lowered times are not really TIMEOUTs, but rather the time before another attempt. As a TSO user, one no longer notices any unusual delays! Thanks!

@Fish-Git
Copy link
Member

Both reported problems are now gone!

That's good to hear!

I guess the lowered times are not really TIMEOUTs, but rather the time before another attempt.

Correct.

As a TSO user, one no longer notices any unusual delays! Thanks!

You are very welcome, Anders.  :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG The issue describes likely incorrect product functionality that likely needs corrected.
Projects
None yet
Development

No branches or pull requests

2 participants