(pre-)Stop/Kill action/command #9872

sofixa · 2021-01-22T11:20:20Z

Currently Nomad supports defining a kill signal, and it'd be pretty useful to be able to define pre- and stop/kill actions/commands ( we can already do post-stop via tasks with lifecycle > hook > poststop).

The main use case i see for this is shutting down complex software/tasks that needs actions performed on it for a graceful shutdown, e.g. ScyllaDB recommend running a command (nodetool drain) and then shutting down gracefully before killing the Docker container.

It could also be useful in order to do more graceful drains, for example when doing rolling upgrades (e.g. failing the healthcheck to make the instance inaccessible from Consul/LB before actually shutting it down).

In theory it could be achieved with an additional hook (prestop), but that might cause some issues ( e.g. in ScyllaDB's case, the prestop task would need to contain all the tools and configuration to be able to run commands on the ScyllaDB running in the main task; and it won't work for the specific case, since they recommend shutting down gracefully via supervisord after draining, and i don't think one can call supervisorctl remotely).

Adrian

The text was updated successfully, but these errors were encountered:

shcherbachev · 2021-09-21T16:08:28Z

Hi!

We would love to have the ability to configure a pre-stop script. It will help us implement smoother upgrades that require load-balancer reconfiguration. Right now our load-balancer coupled with consul services and consul-template will detect that a service is no longer running on a node and will forward traffic to a different node. But this happens after a small delay and only after the job has been terminated.

If we had a pre-stop script we can switch the traffic on the load-balancer before the job started to die. This way we won't have to wait for consul to detect and propagate changes.
Once it's back online we will use the poststart hook to reconfigure the load-balancer to use the local service again.

Also, the prestop hook is the only one missing in the family: prestart, poststart,poststop are there. Personally, I would add it for the sake of symmetry.

Alex

mikeblum · 2021-12-08T01:55:32Z

This is impacting our ability to kick off connection draining for our HAProxy containers running in Nomad - similar to @shcherbachev's use-case. I'll take a look at the code for post-stop and pre-start to get an idea of how pre-stop might work. Stay tuned!

mikeblum · 2021-12-12T18:46:34Z

Hi @tgross

Forked and setup a Nomad dev environment (very smooth on-boarding. The contrib guide was excellent). After reviewing how pre-start and the other lifecycle hooks are implemented I have a few questions on the scope of pre-stop:

For reference here are the docs for lifecycle hooks: https://www.nomadproject.io/docs/job-specification/lifecycle#lifecycle-stanza

blog: https://www.hashicorp.com/blog/hashicorp-nomad-task-dependencies

1. Should we support pre-stop for sidecar tasks?

This section of the structs code points to sidecar support for pre-start. If we implemented pre-stop for a sidecar would we expect this to block stopping the parent task? Or would this be considered a non-blocking optional failure such that a pre-stop task with sidecar enabled:

based off of https://www.nomadproject.io/docs/job-specification/lifecycle#init-task-pattern

  task "halt-telemetry" {
    lifecycle {
      hook = "prestop"
      sidecar = true
    }

    driver = "exec"
    config {
      command = "sh"
      args = ["-c", "while nc -z telemetry.service.local.consul 8080; do sleep 1; done"]
    }
  }

  task "main-app" {
    ...
  }

A use case I could think of would be making sure any buffered logs or other crucial data has been shipped off-box to the telemetry service of choice.

2. Are there any UI components we need to update?

Pre-start / Post-stop task hooks have this UX which is quite nice when there are several lifecycle tasks.

Could this PR be just encompass the Go side changes?

3. How should task kill timeouts be handled?

Example from nomad job init:

# Controls the timeout between signalling a task it will be killed
# and killing the task. If not set a default is used.
kill_timeout = "20s"

In the example.nomad the kill_timeout applies to the main task - I imagine we'll want to support this for pre-stop just like it works for post-stop today but I'm wondering if there are weird implications to having a kill_timeout on the main and/or pre-stop tasks - who wins?

Related issues:

Task Lifecycle PostStart Hook: #8366

I'll keep digging into the code but figured I'd pose these higher level Qs to get the 🤔 going.

liemlhdbeatvn · 2022-08-14T15:16:06Z

Our use case is exactly the same with shcherbachev , is there any progress on this?

jrasell · 2022-08-15T09:33:25Z

Hi @liemlhdbeatvn and others on this issue; this is unfortunately not currently on our near-term roadmap. The team will provide updates as soon as there are any.

ljb2of3 · 2022-08-24T14:13:22Z

I just wanted to drop in and say a prestop feature would be very useful for my use case as well.

Due to the architecture of the system I'm working on, it takes about 10 minutes for traffic to stop flowing to a task once it's removed from our load balancer. It would be great if I could have a prestop job that removes it from the load balancer, then sleeps for 10 minutes before allowing the main task to be stopped.

ljb2of3 · 2022-08-24T14:22:19Z

Of course, as I continue reading the docs... it appears that shutdown_delay will actually meet my needs. @shcherbachev and @liemlhdbeatvn would this work for you as well?

https://www.nomadproject.io/docs/job-specification/group#shutdown_delay

With that in mind, I'd still vote that prestop be added for completeness.

aparfeno · 2023-07-11T10:58:09Z

Hello,
First - thank you for great product. I am adopting it for my use case of micro-service based warehouse management system.
I wanted to add another voice for this feature.
My use case is:
I have stateful server-client interactions (dialogs with hand-held devices) which I am organizing with sticky sessions.
At time of rolling upgrade, I want to gracefully "transition" these stateful sessions from node that's shutting down, to a new one. This involves warning the user to quickly finish his tasks, waiting for him to do that (i.e. reach parts of code that are safe from business point of view to kill user's session), and then moving the session by way of asking client to forget sticky cookie, etc.

It is a complicated song-and-dance. So far I've run into two problems:

Nomad cancels consul registration when kill signal is dispatched - that's too soon for me
In windows/Java I can't catch Ctrl-Break signal, and nomad isn't respecting kill_signal in windows -there is ticket for that).

So far I am considering all kinds of crutches to go around the problems above.
Instead, these can be solved cleanly if I could tell my app through a pre-stop script that it is time to shutdown. It would interact with users, deal with Consul appropriately, etc.

Thank you,
Alex

gjrtimmer · 2024-10-25T17:14:54Z

My use case for this is very easy. I would like to issue the same API calls through curl to gracefully shutdown loki.

Reason: I'm using the ephemeral disk to store the index, cache, wal, etc., on a very fast NVMe of the server. Calling these endpoints will flush all the log chunks to the S3 storage, before the container shutdown, and in the case there is an error on migrating the ephemeral disk to another node I do not loose any logging.

POST /flush
POST /ingester/prepare_shutdown
POST /ingester/shutdown

gjrtimmer · 2024-10-25T17:17:22Z

@jrasell Any update on this?

tgross added theme/task lifecycle type/enhancement stage/needs-discussion labels Jan 22, 2021

tgross changed the title ~~[Feature request] (pre-)Stop/Kill action/command~~ (pre-)Stop/Kill action/command Jan 22, 2021

david-yu added the hcc/jira label Jun 19, 2024

pkazmierczak assigned pkazmierczak and unassigned pkazmierczak Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(pre-)Stop/Kill action/command #9872

(pre-)Stop/Kill action/command #9872

sofixa commented Jan 22, 2021 •

edited

Loading

shcherbachev commented Sep 21, 2021

mikeblum commented Dec 8, 2021 •

edited

Loading

mikeblum commented Dec 12, 2021 •

edited

Loading

liemlhdbeatvn commented Aug 14, 2022

jrasell commented Aug 15, 2022

ljb2of3 commented Aug 24, 2022

ljb2of3 commented Aug 24, 2022

aparfeno commented Jul 11, 2023

gjrtimmer commented Oct 25, 2024

gjrtimmer commented Oct 25, 2024

(pre-)Stop/Kill action/command #9872

(pre-)Stop/Kill action/command #9872

Comments

sofixa commented Jan 22, 2021 • edited Loading

shcherbachev commented Sep 21, 2021

mikeblum commented Dec 8, 2021 • edited Loading

mikeblum commented Dec 12, 2021 • edited Loading

liemlhdbeatvn commented Aug 14, 2022

jrasell commented Aug 15, 2022

ljb2of3 commented Aug 24, 2022

ljb2of3 commented Aug 24, 2022

aparfeno commented Jul 11, 2023

gjrtimmer commented Oct 25, 2024

gjrtimmer commented Oct 25, 2024

sofixa commented Jan 22, 2021 •

edited

Loading

mikeblum commented Dec 8, 2021 •

edited

Loading

mikeblum commented Dec 12, 2021 •

edited

Loading