Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeout parameter for cloud-init --wait #4059 #4717

Conversation

FoRtY-5
Copy link

@FoRtY-5 FoRtY-5 commented Dec 23, 2023

Proposed Commit Message

feature: timeout for cloud-init status --wait command

Added `--timeout` flag for `cloud-init status --wait` command to terminate after 
declared amount of seconds.

Fixes GH-#4059
LP: #1999876

Additional Context

Example usage cloud-init status --wait --timeout 5
Time to terminate provided in timeout flag might not be 1:1 with real time in seconds because of the time library behavior (I suppose), the implemented solution meant to be clear and readable and also fulfills requirements assumed based on the context from https://irclogs.ubuntu.com/2022/12/15/%23cloud-init.html
#4059

Test Steps

cd cloud-init/cmd
python main.py status --wait --timeout 10

Checklist

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

@holmanb
Copy link
Member

holmanb commented Dec 27, 2023

I'm still not convinced that this is a real value add. The user can already do this themselves with cloud-init status in a loop.

If a timeout happens, the user is going to have to do error handling based on the return code anyways (otherwise their code will be making invalid assumptions). Is that really so hard to do for the typical user of this flag?

I'm not going to say we can't do this, but I am hesitant to add this, since the real users of this are automated scripts, which I would assume have the ability to loop since they also need the ability to do conditional execution.

Thoughts?

@FoRtY-5
Copy link
Author

FoRtY-5 commented Jan 2, 2024

I think that based on the conversation it would be handy for testers to use the cloud-init status --wait with the --timeout flag - without the --timeout flag the --wait flags blocks the execution of the test until status changes.
In my opinion it maybe makes sense for testers to have a simple one line command cloud-init status --wait --timeout SECONDS, so it makes their job a little bit easier(?)

With that being said I would consider adding this small enhancement to the main branch - personally I don't see a lot of value added to the final product, maybe life of some testers would be a little bit easier after this change.
Consider closing this issue after deciding what to do with this PR.

@holmanb
Copy link
Member

holmanb commented Jan 3, 2024

Thanks for contributing, @FoRtY-5. Please add yourself to the cla. Also, this adds code but no tests. Please add test coverage in tests/integration_tests/cmd/test_status.py and tests/unittests/cmd/test_status.py.

In my opinion it maybe makes sense for testers to have a simple one line command cloud-init status --wait --timeout SECONDS, so it makes their job a little bit easier(?)

Allowing them to add a flag rather than write their own timeout look on cloud-init status makes their job easier. It just also adds code that we have to maintain. And if we're going to give users a new feature lets at least document how to use it correctly.

The point that I'm trying to make is that every caller of this flag will absolutely need to know whether this command exited because of timeout or because cloud-init has completed. This call only adds value when the user is able to discern whether cloud-init has timed out or not, and currently in your implementation, there is no way to signal this to the caller without parsing stdout. If status is called with --timeout 5 and the call took 5 seconds, did the call timeout or did cloud-init actually complete at 5 seconds (which is entirely possible)? Lets give the user of this new feature an example of how to use it correctly.

@@ -130,6 +130,13 @@ def get_parser(parser=None):
default=False,
help="Block waiting on cloud-init to complete",
)
parser.add_argument(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since --timeout without --wait a useless option, lets do something about that.

"--timeout",
type=int,
nargs=1,
help="Timeout in seconds for --wait flag - use to avoid forever wait",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text - use to avoid forever wait" is not wrong, but also not useful. A timeout obviously avoids a forever wait. More helpful would be a one sentence explanation of how to use this feature.

@@ -139,15 +146,26 @@ def handle_status_args(name, args) -> int:
paths = read_cfg_paths()
details = get_status_details(paths, args.wait)
if args.wait:
if args.timeout:
timeout_secs = args.timeout[0] * 0.01
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach requires multiple syscalls to get current time per loop, it reads confusingly, and it requires sprinkling timekeeping calls in different parts of the code

        if args.timeout:
            timeout_secs = args.timeout[0] * 0.01
...
        while details.status in (
            UXAppStatus.NOT_RUN,
            UXAppStatus.RUNNING,
            UXAppStatus.DEGRADED_RUNNING,
        ):
            if args.timeout:
                start_time = time()
...
            if args.timeout:
                timeout_secs -= time() - start_time
                if timeout_secs < 0:
                    break

how about:

        start_time = time()
        while details.status in (
            UXAppStatus.NOT_RUN,
            UXAppStatus.RUNNING,
            UXAppStatus.DEGRADED_RUNNING,
        ):
            # the user requested that `cloud-init status` exit early
            if args.timeout and (time() - start_time) > args.timeout[0]:
                break

@@ -84,3 +84,12 @@ contain any of the following states:

See :ref:`our explanation of failure states<failure_states>` for more
information.

Cloud-init status --wait
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something more descriptive than the command name. I think that it would be best if we have a title that matches what a user might be searching for.

Suggested change
Cloud-init status --wait
How to block until cloud-init has completed

@@ -84,3 +84,12 @@ contain any of the following states:

See :ref:`our explanation of failure states<failure_states>` for more
information.

Cloud-init status --wait
-----------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty line required after this line.


Cloud-init status --wait
-----------------
To wait for the status to be fetched use ``cloud-init status --wait``: script will wait forever
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the command cloud-init status --wait itself fetches a status, the phrase "To wait for the status to be fetched" is ambiguous. Are we waiting to fetch the status from cloud-init or are we waiting to fetch the status that cloud-init has completed?

Additionally "script will wait forever for the status response." is misleading, because this rarely happens. Therefore, describing this in a conditional way is less confusion: : when cloud-init cannot complete (i.e., due to a bug in the cloud provider or init system), the script may never return.

To wait for the status to be fetched add the ``--timeout`` flag to set desired time for script to
terminate in seconds e.g. ``--timeout 5``. **Using '--timeout' flag will exit without finishing the configuration if
there are no status changes**

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this feature is worth adding at all, then it is certainly worth describing how to use it properly. Lets include an example of how to tell whether cloud-init status --wait --timeout 5 timed out or whether it actually completed, since that is something that every caller of this command will need to know.

Copy link

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.)

@github-actions github-actions bot added the stale-pr Pull request is stale; will be auto-closed soon label Jan 23, 2024
@github-actions github-actions bot closed this Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale-pr Pull request is stale; will be auto-closed soon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants