-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Restart running streaming job to redeploy configuration changes #389
Comments
@MarekLani what you describe here is actually expected behavior or all of terraform resources. Are you sure you're using it the right way?.. What's the use case? |
This can be done via tainting the resource. On your next apply the resource will be recreated with new id and new things. As the api goes all the fields are editable and updated in place so there is no field i believe afaik that forces a recreate. your best option is |
@nfx @stikkireddy thanks for replies. I ended up using taint from command line. Nevertheless when I want to have build specific part of setup usually resources are teared down and redeployed if I change their name. Of course I understand that databricks jobs API does not work that way and it enables editing of the name without re-creating of the job. Nevertheless if I want to recreate the job from terraform file basically I have no option to do that. Scenario is that I have databricks streaming job which runs continuously and is set to 1 concurrent run. If I make any change to the job terraform configuration or DBFS file on which the job is based I have no way how to get these changes to already running job or force the job run to restart from terraform. I also should not use Databricks API to re-create the job, because terraform state would get out of sync. And as there is no stop/restart job command in Databricks API I am simply unable to create new run of the job with new configuration without re-creating the job. I know there is reset command in Databricks API, but that requires json configuration of the job to be passed as parameter, which would again put me out of sync with terraform state. |
@MarekLani thanks for more context. If my understanding is correct, you're building CD process with terraform for a streaming job. This is a feature request, rather than a bug.
If we have different name of the artifact - e.g. version, we can implement a graceful job restart, if it's running. With additional fields. We won't be re-creating a job, because job run history would be lost and that's not the logical behavior. Please provide more information. |
thanks @nfx and sorry I might have changed this entry to feature when creating. I am using spark_python_task referencing DBFS file created this way:
Can you please share bit more on how the version might be applied? It sounds like ideal solution. Also didn't realize the loss of history, that is indeed not desired behavior. |
@MarekLani and does |
it makes use of pypi and maven libraries, but core processing logic is only in the main.py |
terraform needs to know if something has changed or not. can you upload each new version of a notebook with different suffix? e.g. Then the parameter i'm thinking to introduce is |
yes I can include part of build id as version. If I can provide my vote I wold use always_running. So idea is that this would enable graceful restart? |
something like that. I'll check with couple of more folks, because we might need exactly the same "virtual option" for always-on clusters (e.g. i see pattern that auto-starts clusters for business hours). |
Hi @MarekLani , somewhere after 0.3.0 we could do restart of jobs if underlying file changes (thanks to changed in databricks_dbfs_file from #417 ). How graceful the restart of streaming job has to be? What happens if you press cancel on streaming job UI? does the stream gracefully end? If this is the case, then it might be relatively easy to implement.
|
Hi @nfx thanks for response. Pardon my delay, but as I am responsible purely for CD part of the project I needed to connect with the rest of my team working on the actual job logic. We need to get understanding of the behavior of libraries we are using to connect to the queues and how much resiliency there is provided out of the box and will get back to you. However my guess would be your suggested approach should be enough. |
So @nfx the approach with cancelling the job you described should be fine for us. We are connecting to Azure Event Hubs while libraries we use should have resilient checkpointing implemented. So position is not checkpointed until DF processing is not finished and thus it will meet our at least once delivery requirement. However just of of curiosity I would be interested to hear your ideas on whether more graceful approach would be possible and what would it take? This probably goes down to what interfaces Databricks API offers at this point. Thank you. |
@MarekLani if you are still trying to restart a job after the configuration changes. What you are trying to actually do is:
Something that you may be able to do is use a null resource and make it trigger from your databricks_job configs: https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource With a null resource and a trigger that listens to the appropriate configs of your job that you are expecting to change, then you can set up a local provisioner that runs a python script that executes the 3 above steps taking advantage of the Runs api. https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-cancel This provider at this point in time does not track or manage state of runs only the job level config and there is nothing in the platform that automatically restarts a run if config changes AFAIK. |
@stikkireddy thank you, this sounds as a fine approach to really restart the job only when needed instead of doing it on each run of the terraform. Thank you. However still leaving this one opened, as it would be nice to be able to allow for restart directly within the terraform resource without need of touching databricks API. |
Hopefully I can piggyback on this thread and ask a related question. I have the same use case as above (a streaming job with max concurrent runs = 1) but I would also like to be able to trigger a run of the job on a fresh deployment. Would this also be included in the v0.4.0 milestone? Should I file another feature request? |
@travisemichael just to confirm - fresh deployment would mean fresh JAR/Python file/Notebook with different name/eventually notebook project with newer version, correct?.. tagging @lennartcl to see if we can get more graceful streaming stops than those provided by |
@nfx What I mean by a fresh deployment would be a brand new Job. It would be nice to have it set up so that on the first deploy the job will start a single run, and on subsequent deploys the previous run will be restarted as described above. |
@travisemichael makes sense for |
* Implements feature #389 * Functionality is triggered only if `always_running` attribute is present * Uses RunsList, RunsCancel and RunNow methods from Jobs API
@travisemichael @MarekLani i've started working on this feature in the linked pull request |
* Implements feature #389 * Functionality is triggered only if `always_running` attribute is present * Uses RunsList, RunsCancel and RunNow methods from Jobs API
* Implements feature #389 * Functionality is triggered only if `always_running` attribute is present * Uses RunsList, RunsCancel and RunNow methods from Jobs API
* Implements feature #389 * Functionality is triggered only if `always_running` attribute is present * Uses RunsList, RunsCancel and RunNow methods from Jobs API
* Implements feature databricks#389 * Functionality is triggered only if `always_running` attribute is present * Uses RunsList, RunsCancel and RunNow methods from Jobs API
Hello,
I am trying to force re-creation of job, but apparently there is no single field in the databricks job resource, which edited value would cause re-creation of the job on databricks side. Even job name can be edited without redeploying the job. Would it be possible to add some field, which would be used only for the needs of terraform and its value would have effect on re-creation of the databricks job?
Terraform Version
N/A
Affected Resource(s)
Expected Behavior
I want to have build specific part of setup usually resources are teared down and redeployed if I change their name. Of course I understand that databricks jobs API does not work that way and it enables editing of the name without re-creating of the job. Nevertheless if I want to recreate the job from terraform file basically I have no option to do that. Scenario is that I have databricks streaming job which runs continuously and is set to 1 concurrent run. If I make any change to the job terraform configuration or DBFS file on which the job is based I have no way how to get these changes to already running job or force the job run to restart from terraform. I also should not use Databricks API to re-create the job, because terraform state would get out of sync. And as there is no stop/restart job command in Databricks API I am simply unable to create new run of the job with new configuration without re-creating the job. I know there is reset command in Databricks API, but that requires json configuration of the job to be passed as parameter, which would again put me out of sync with terraform state.
Actual Behavior
There is no way how to force re-creation of job just from terraform file. Workaround is to use terraform taint command from command line, which forces redeploy of the resource, nevertheless I would like to avoid this approach.
Steps to Reproduce
N/A
The text was updated successfully, but these errors were encountered: