Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_command: wait for command to run indefinitely #59

Merged
merged 1 commit into from
Jan 26, 2024

Conversation

chantra
Copy link
Collaborator

@chantra chantra commented Jan 25, 2024

Fixes #40

When both the host and the VM are saturated, the VM may not be scheduled for a while, causing the connection to qga to timeout.
Prior to #27 the unix socket would block indefinitely.

This change brings back this behaviour while we run a command, and set the timeout back to what it was before running the command. This way, we can give a chance to the host/vm to recover.

Tested by running:

stress --cpu 256 --io 256 --vm 4 --vm-bytes 1024M --timeout 1000s

in the host, and

stress --cpu 512 --io 512 --vm 4 --vm-bytes 1024M --timeout 100s & while true; do date; sleep 2; done

in the guest.

Output showsa that the VM was struggling to run its loop and print the date until the stress tool was done:

===> Setting up VM
[    1.840931] 9pnet: Limiting 'msize' to 512000 as this is the maximum supported by transport virtio
===> Running command
Thu Jan 25 01:43:12 PM PST 2024
stress: info: [85] dispatching hogs: 512 cpu, 512 io, 4 vm, 0 hdd
[    5.493242] hrtimer: interrupt took 3387058 ns
[   89.331664] clocksource: timekeeping watchdog on CPU1: hpet wd-wd read-back delay of 31697570ns
[   89.465647] clocksource: wd-tsc-wd read-back delay of 2998850ns, clock-skew test skipped!
Thu Jan 25 01:45:00 PM PST 2024
Thu Jan 25 01:45:09 PM PST 2024
Thu Jan 25 01:45:12 PM PST 2024
stress: info: [85] successful run completed in 124s
Thu Jan 25 01:45:17 PM PST 2024
Thu Jan 25 01:45:19 PM PST 2024
Thu Jan 25 01:45:21 PM PST 2024
Thu Jan 25 01:45:23 PM PST 2024
Thu Jan 25 01:45:26 PM PST 2024
Thu Jan 25 01:45:28 PM PST 2024

Signed-off-by: Manu Bretelle chantr4@gmail.com

Fixes danobi#40

When both the host and the VM are saturated, the VM may not be scheduled for a
while, causing the connection to qga to timeout.
Prior to danobi#27 the unix socket would block indefinitely.

This change brings back this behaviour while we run a command, and set the
timeout back to what it was before running the command.
This way, we can give a chance to the host/vm to recover.

Tested by running:
```
stress --cpu 256 --io 256 --vm 4 --vm-bytes 1024M --timeout 1000s
```
in the host, and
```
stress --cpu 512 --io 512 --vm 4 --vm-bytes 1024M --timeout 100s & while true; do date; sleep 2; done
```
in the guest.

Output showsa that the VM was struggling to run its loop and print the date until
the stress tool was done:

```
===> Setting up VM
[    1.840931] 9pnet: Limiting 'msize' to 512000 as this is the maximum supported by transport virtio
===> Running command
Thu Jan 25 01:43:12 PM PST 2024
stress: info: [85] dispatching hogs: 512 cpu, 512 io, 4 vm, 0 hdd
[    5.493242] hrtimer: interrupt took 3387058 ns
[   89.331664] clocksource: timekeeping watchdog on CPU1: hpet wd-wd read-back delay of 31697570ns
[   89.465647] clocksource: wd-tsc-wd read-back delay of 2998850ns, clock-skew test skipped!
Thu Jan 25 01:45:00 PM PST 2024
Thu Jan 25 01:45:09 PM PST 2024
Thu Jan 25 01:45:12 PM PST 2024
stress: info: [85] successful run completed in 124s
Thu Jan 25 01:45:17 PM PST 2024
Thu Jan 25 01:45:19 PM PST 2024
Thu Jan 25 01:45:21 PM PST 2024
Thu Jan 25 01:45:23 PM PST 2024
Thu Jan 25 01:45:26 PM PST 2024
Thu Jan 25 01:45:28 PM PST 2024
```
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Copy link
Owner

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable to me. (I didn't spend much time thinking deeply about implications tho). @DolceTriade do you have any input?

@chantra
Copy link
Collaborator Author

chantra commented Jan 26, 2024

seems reasonable to me. (I didn't spend much time thinking deeply about implications tho). @DolceTriade do you have any input?

We could possibly make the timeout configurable through an argument/config. I did not find any reasonable value for it. On the other hand, if one cares about a process going for too long, maybe the responsibility of killing vmtest + qemu can be left to timeout command.

Copy link
Contributor

@DolceTriade DolceTriade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable to me as well.

@chantra
Copy link
Collaborator Author

chantra commented Jan 26, 2024

ok, I am going to land this then. We can revisit later if we need to set a timeout. Thanks all.

@chantra chantra merged commit 5b65b7e into danobi:master Jan 26, 2024
1 check passed
@chantra chantra deleted the test_qga_not_timeout branch January 26, 2024 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Common flakiness caused by Failed to QGA guest-exec-status
3 participants