New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Improve ExperimentData handling of jobs and analysis #599

Merged

chriseclectic merged 7 commits into qiskit-community:main from chriseclectic:job-futures

Jan 20, 2022

Collaborator

chriseclectic commented Jan 7, 2022 •

edited

Loading

Summary

Depends on #596
Fixes #573, #592

This reworks how analysis callbacks are run to use separate Futures for each callback and improves handling of futures for jobs and callbacks in ExperimentData.

Details and comments

This also adds some convenience methods to ExperimentData:

jobs: returns a list of all jobs added to the experiment. This can be used to monitor specific jobs for example.
job_status returns the status of all job execution. This is similar to existing status, but will return DONE if all jobs are finished regardless of current callback status.
callback_status returns the status of callback execution. This will be QUEUED if callbacks are still waiting for jobs to finish running, RUNNING if at least one callback is still running, CANCELLED or ERROR if any callbacks were cancelled or had an error, or DONE if all callbacks have finished successfully.
cancel_callbacks cancels any queued callback futures that have not started running yet. Since callbacks run on a single-worker thread pool executor only the currently running callback future cannot be cancelled.
cancel cancel all jobs and callbacks.
callback_errors returns string of any errors encountered during callback execution

Other methods are updated:

status: Returns "EMPTY" instead of "DONE" if no jobs, data, or callbacks are contained in the experiment data.
errors has improved reporting for job and callback errors encountered.

chriseclectic added the Changelog: New Feature label

chriseclectic changed the title ~~Improve ExperimentData handling of job futures~~ Improve ExperimentData handling of callbacks and futures

chriseclectic requested a review from nkanazawa1989

January 10, 2022 16:24

chriseclectic added the Changelog: Bugfix label

This was referenced Jan 10, 2022

Improve handling of analysis callbacks #598

Closed

Make ExperimentData serializable #604

Merged

nkanazawa1989 requested changes

View reviewed changes

Collaborator

nkanazawa1989 left a comment

Halfway through the review. Feel free to start update. I'll review rest of changes tomorrow.

qiskit_experiments/database_service/db_experiment_data.py Outdated Show resolved Hide resolved

qiskit_experiments/database_service/db_experiment_data.py Outdated Show resolved Hide resolved

qiskit_experiments/database_service/db_experiment_data.py Show resolved Hide resolved

qiskit_experiments/database_service/db_experiment_data.py Outdated Show resolved Hide resolved

qiskit_experiments/database_service/db_experiment_data.py Show resolved Hide resolved

qiskit_experiments/database_service/db_experiment_data.py Outdated Show resolved Hide resolved

qiskit_experiments/database_service/db_experiment_data.py Show resolved Hide resolved

qiskit_experiments/database_service/db_experiment_data.py Outdated Show resolved Hide resolved

nkanazawa1989 reviewed

View reviewed changes

Collaborator

nkanazawa1989 left a comment

Thanks Chris this PR looks awesome. The logic looks really clean and easy to read. Do you think you can write unittest for #573 before closing it (currently the test for status is too obvious since it just checks status that is manually overridden)? I think this test is bit tough since the event is hard to reproduce.

qiskit_experiments/database_service/db_experiment_data.py Outdated Show resolved Hide resolved

qiskit_experiments/database_service/db_experiment_data.py Show resolved Hide resolved

qiskit_experiments/database_service/db_experiment_data.py

@@ @@ -779,13 +833,6 @@ def _save_experiment_metadata(self) -> None: @@
                           LOG.warning("Experiment cannot be saved because backend is missing.")
                           return
-                      with self._job_futures.lock:

Collaborator

nkanazawa1989 Jan 13, 2022

This is removed because now _wait_for_analysis method will wait for job?

qiskit_experiments/database_service/db_experiment_data.py

		@@ -862,8 +909,8 @@ def save(self) -> None:

		if self.verbose:

Collaborator

nkanazawa1989 Jan 13, 2022

also analysis status should be done here (i.e. no error).

qiskit_experiments/database_service/db_experiment_data.py Show resolved Hide resolved

qiskit_experiments/database_service/db_experiment_data.py

-                      If the experiment consists of multiple jobs, the returned status is mapped
-                      in the following order:
+                      with self._analysis_futures.lock and self._job_futures.lock:

Collaborator

nkanazawa1989 Jan 13, 2022

We don't need to shutdown executor when not_done_* is empty?

Collaborator Author

chriseclectic Jan 13, 2022

The executor doesn't need to be shutdown, it can sit around forever waiting to make more futures if needed.

Collaborator

nkanazawa1989 Jan 15, 2022

Then we expect the assigned resources in the analysis executor of each experiment data instance will be eliminated by garbage collection (we can come back to this later when we start to get memory issue as the size of parallel experiment scales)?

qiskit_experiments/database_service/db_experiment_data.py Show resolved Hide resolved

qiskit_experiments/database_service/db_experiment_data.py Outdated Show resolved Hide resolved

qiskit_experiments/database_service/db_experiment_data.py Show resolved Hide resolved

chriseclectic force-pushed the job-futures branch from e6a8f66 to 1f1510f Compare

January 13, 2022 19:53

chriseclectic changed the title ~~Improve ExperimentData handling of callbacks and futures~~ Improve ExperimentData handling of jobs and analysis

nkanazawa1989 reviewed

View reviewed changes

Collaborator

nkanazawa1989 left a comment

Minor comments for new commits. Previous updates are really nice.

qiskit_experiments/database_service/db_experiment_data.py

-                          )
+                      # Add future for cancelling jobs that timeout
+                      if timeout_ids:
+                          self._job_executor.submit(self._timeout_running_jobs, timeout_ids, timeout)

Collaborator

nkanazawa1989 Jan 15, 2022

Nice. I like this logic 💯

qiskit_experiments/database_service/db_experiment_data.py Show resolved Hide resolved

qiskit_experiments/database_service/db_experiment_data.py Outdated

+                      Args:
+                          jobs: The Job or list of Jobs to add result data from.
+                          timeout: Optional, time to wait for jobs to finish before

Collaborator

nkanazawa1989 Jan 15, 2022

Please add time unit for the value.

qiskit_experiments/database_service/db_experiment_data.py

+                      if waited.not_done:
+                          LOG.debug("Cancelling running jobs that exceeded add_jobs timeout.")
+                          done_ids = {fut.result()[0] for fut in waited.done}
+                          notdone_ids = [jid for jid in job_ids if jid not in done_ids]

Collaborator

nkanazawa1989 Jan 15, 2022

just minor comment; why we cannot simply write self.cancel_jobs([fut.result[0] for fut in waited.not_done])? another comment; do you think this can be simplified with as_complete function?

Collaborator Author

chriseclectic Jan 19, 2022

Not done futures won't have a result, the job id is only returned when the future finishes. This is why it can be extracted only from the done futures and the reason for the above work around to figure out the not-done job ids.

qiskit_experiments/database_service/db_experiment_data.py

+                              self._add_job_data(job)
+                          else:
+                              # Add job results asynchronously
+                              self._add_job_future(job)

Collaborator

nkanazawa1989 Jan 15, 2022

We don't need to retrieve with timeout and call add_job here?

Collaborator Author

chriseclectic Jan 19, 2022

Changing to calling add_jobs would raise a bunch of warnings about existing job ids at the moment. It could be good to think about changing later so that backend+service can be extracted from job for expdata saved/loaded to JSON, since the backend/provider cant be serialized/deserialized, but that might be better to treat in the serialization PR.

qiskit_experiments/database_service/db_experiment_data.py

-                      If the experiment consists of multiple jobs, the returned status is mapped
-                      in the following order:
+                      with self._analysis_futures.lock and self._job_futures.lock:

Collaborator

nkanazawa1989 Jan 15, 2022

Then we expect the assigned resources in the analysis executor of each experiment data instance will be eliminated by garbage collection (we can come back to this later when we start to get memory issue as the size of parallel experiment scales)?

releasenotes/notes/expdata-futures-87a2ff561375e22b.yaml

+                  pending analysis callbacks. Note that analysis callbacks that have already
+                  started running cannot be cancelled.
+                - |
+                  Adds :meth:`.ExperimentData.cancel` to cancel both jobs and analysis.

Collaborator

nkanazawa1989 Jan 15, 2022

also need description for timeout arg in run method and and add_job method.

releasenotes/notes/expdata-futures-87a2ff561375e22b.yaml Outdated

+                  The ``timeout`` kwarg of :meth:`.ExperimentData.add_data` has been deprecated.
+                  To timeout waiting for job results use the ``timeout`` kwarg with
+                  :meth:`.ExperimentData.block_for_results` or
+                  :meth:`.ExperimentData.analysis_results` instead.

Collaborator

nkanazawa1989 Jan 15, 2022

Also need deprecation of adding job to add_data method.

chriseclectic added 6 commits

January 19, 2022 13:41


          Improve analysis callbacks

b03f9a4


          Improve handling of job data futures

61adba2


          Improve futures logging and waiting

414c91e


          Rename callback methods to use analysis naming


          Add ExperimentStatus and AnalysisStatus enums

cbe7bcb


          Address review comments

c0c0d0a

* Add `add_jobs` method to ExperimentData
* Add timeout functionality to add_jobs and Experiment.run that allows an experiment to set a timeout before cancelling all non-finished jobs
* Adds status enum as ExperimentData class variable for easier access for value comparisons

chriseclectic force-pushed the job-futures branch from 1f1510f to 1563dd2 Compare

January 19, 2022 19:41


          Review comments add tests

06d96ab

* This PR improves ExperimentData handling of qiskit job data and analysis callbacks to make querying job and analysis statuses and cancelling jobs or analysis processes more robust.
* Changes `ExperimentData.status` to return `ExperimentStatus enum value
* Adds `ExperimentData.job_status` method  and `JobStatus` enum return type
* Adds` ExperimentData.analysis_status` method and `AnalysisStatus enum return type.
* Adds `ExperimentData.cancel_jobs`, `ExperimentData.cancel_analysis` and `ExperimentData.cancel` methods for cancelling running jobs, queued analysis, or both.
* Adds `ExperimentData.add_jobs` method for adding job data. Deprecates adding job data via `add_data` method.

chriseclectic force-pushed the job-futures branch from 1563dd2 to 06d96ab Compare

January 19, 2022 19:47

chriseclectic added Changelog: API Change Changelog: Deprecation and removed Changelog: Bugfix labels

nkanazawa1989 approved these changes

View reviewed changes

Collaborator

nkanazawa1989 left a comment

LGTM, This PR allows us to handle callback and status more safely. Thanks Chris!

chriseclectic merged commit f229944 into qiskit-community:main

nkanazawa1989 mentioned this pull request

Update unittest with run success check #568

Merged

chriseclectic mentioned this pull request

Automatic experiment monitor #521

Open

chriseclectic deleted the job-futures branch

March 3, 2022 22:44

paco-ri pushed a commit to paco-ri/qiskit-experiments that referenced this pull request


          Improve ExperimentData handling of jobs and analysis (qiskit-communit…

cf269d0

…y#599)

This PR improves ExperimentData handling of qiskit job data and analysis callbacks to make querying job and analysis statuses and cancelling jobs or analysis processes more robust.

It also includes some API changes to ExperimentData:
* Changes `ExperimentData.status` to return `ExperimentStatus enum value
* Adds `ExperimentData.job_status` method  and `JobStatus` enum return type
* Adds` ExperimentData.analysis_status` method and `AnalysisStatus enum return type.
* Adds `ExperimentData.cancel_jobs`, `ExperimentData.cancel_analysis` and `ExperimentData.cancel` methods for cancelling running jobs, queued analysis, or both.
* Adds `ExperimentData.add_jobs` method for adding job data. Deprecates adding job data via `add_data` method.

yaelbh mentioned this pull request

Analysis runs in spite of job failure #866

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: API Change Changelog: Deprecation Changelog: New Feature