Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/detailed-status #451

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/push-pr_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ jobs:

- name: Run pytest over unit test suite
run: |
python3 -m pytest tests/unit/
python3 -m pytest -v --order-scope=module tests/unit/

- name: Run integration test suite for local tests
run: |
Expand Down
25 changes: 22 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,34 +6,46 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]
### Added
- Pytest fixtures in the `conftest.py` file of the integration test suite
- NOTE: an export command `export LC_ALL='C'` had to be added to fix a bug in the WEAVE CI. This can be removed when we resolve this issue for the `merlin server` command
- Tests for the `celeryadapter.py` module
- New CeleryTestWorkersManager context to help with starting/stopping workers for tests
- A new command `merlin detailed-status` that displays task-by-task status information about your study
- This has options to filter by return code, task queues, task statuses, and workers
- You can set a limit on the number of tasks to display
- There are 3 options to modify the output display
- New file `merlin/study/status.py` dedicated to work relating to the status command
- Contains the Status and DetailedStatus classes
- New file `merlin/study/status_renderers.py` dedicated to formatting the output for the detailed-status command
- New file `merlin/common/dumper.py` containing a Dumper object to help dump output to outfiles
- Study name and parameter info now stored in the DAG and MerlinStep objects
- Added functions to `merlin/display.py` that help display status information:
- `display_task_by_task_status` handles the display for the `merlin detailed-status` command
- `display_status_summary` handles the display for the `merlin status` command
- `display_progress_bar` generates and displays a progress bar
- Added new methods to the MerlinSpec class:
- get_worker_step_map()
- get_queue_step_relationship()
- get_tasks_per_step()
- get_step_param_map()
- Added methods to the MerlinStepRecord class to mark status changes for tasks as they run (follows Maestro's StepRecord format mostly)
- Added methods to the Step class:
- establish_params()
- name_no_params()
- Added a property paramater_labels to the MerlinStudy class
- Added two new utility functions:
- dict_deep_merge() that deep merges two dicts into one
- ws_time_to_dt() that converts a workspace timestring (YYYYMMDD-HHMMSS) to a datetime object
- A new celery task `condense_status_files` to be called when sets of samples finish
- Added a celery config setting `worker_cancel_long_running_tasks_on_connection_loss` since this functionality is about to change in the next version of celery
- Tests for the Status class
- Tests for the Status and DetailedStatus classes
- this required adding a decent amount of test files to help with the tests; these can be found under the tests/unit/study/status_test_files directory

### Changed
- Reformatted the entire `merlin status` command
- Now accepts both spec files and workspace directories as arguments
- e.g. "merlin status hello.yaml" and "merlin status hello_20230228-111920/" both work
- Removed the --steps flag
- Replaced the --csv flag with the --dump flag
- This will make it easier in the future to adopt more file types to dump to
- New functionality:
- Shows step_by_step progress bar for tasks
- Displays a summary of task statuses below the progress bar
Expand All @@ -45,8 +57,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Moved `verify_filepath` and `verify_dirpath` from `merlin/main.py` to `merlin/utils.py`

### Fixed
- The `merlin status` command so that it's consistent in its output whether using redis or rabbitmq as the broker
- The `merlin monitor` command will now keep an allocation up if the queues are empty and workers are still processing tasks
- Add the restart keyword to the specification docs
- Cyclical imports and config imports that could easily cause ci issues

## [1.11.1]
### Fixed
- Typo in `batch.py` that caused lsf launches to fail (`ALL_SGPUS` changed to `ALL_GPUS`)

## [1.11.0]
### Added
- New reserved variable:
Expand Down
16 changes: 8 additions & 8 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down Expand Up @@ -87,7 +87,7 @@ install-dev: virtualenv install-merlin install-workflow-deps
# tests require a valid dev install of merlin
unit-tests:
. $(VENV)/bin/activate; \
$(PYTHON) -m pytest $(UNIT); \
$(PYTHON) -m pytest -v --order-scope=module $(UNIT); \


# run CLI tests - these require an active install of merlin in a venv
Expand Down Expand Up @@ -135,9 +135,9 @@ check-flake8:

check-black:
. $(VENV)/bin/activate; \
$(PYTHON) -m black --check --line-length $(MAX_LINE_LENGTH) --target-version py36 $(MRLN); \
$(PYTHON) -m black --check --line-length $(MAX_LINE_LENGTH) --target-version py36 $(TEST); \
$(PYTHON) -m black --check --line-length $(MAX_LINE_LENGTH) --target-version py36 *.py; \
$(PYTHON) -m black --check --line-length $(MAX_LINE_LENGTH) --target-version py38 $(MRLN); \
$(PYTHON) -m black --check --line-length $(MAX_LINE_LENGTH) --target-version py38 $(TEST); \
$(PYTHON) -m black --check --line-length $(MAX_LINE_LENGTH) --target-version py38 *.py; \


check-isort:
Expand Down Expand Up @@ -179,9 +179,9 @@ fix-style:
$(PYTHON) -m isort -w $(MAX_LINE_LENGTH) $(MRLN); \
$(PYTHON) -m isort -w $(MAX_LINE_LENGTH) $(TEST); \
$(PYTHON) -m isort -w $(MAX_LINE_LENGTH) *.py; \
$(PYTHON) -m black --target-version py36 -l $(MAX_LINE_LENGTH) $(MRLN); \
$(PYTHON) -m black --target-version py36 -l $(MAX_LINE_LENGTH) $(TEST); \
$(PYTHON) -m black --target-version py36 -l $(MAX_LINE_LENGTH) *.py; \
$(PYTHON) -m black --target-version py38 -l $(MAX_LINE_LENGTH) $(MRLN); \
$(PYTHON) -m black --target-version py38 -l $(MAX_LINE_LENGTH) $(TEST); \
$(PYTHON) -m black --target-version py38 -l $(MAX_LINE_LENGTH) *.py; \


# Increment the Merlin version. USE ONLY ON DEVELOP BEFORE MERGING TO MASTER.
Expand Down
16 changes: 9 additions & 7 deletions docs/source/merlin_commands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,15 @@ Monitor (``merlin monitor``)
Batch submission scripts may not keep the batch allocation alive
if there is not a blocking process in the submission script. The
``merlin monitor`` command addresses this by providing a blocking process that
checks for tasks in the queues every (sleep) seconds. When the queues are empty, the
blocking process will exit and allow the allocation to end.
checks for tasks in the queues every (sleep) seconds. When the queues are empty, the
monitor will query celery to see if any workers are still processing tasks from the
queues. If no workers are processing any tasks from the queues and the queues are empty,
the blocking process will exit and allow the allocation to end.

The ``monitor`` function will check for celery workers for up to
10*(sleep) seconds before monitoring begins. The loop happens when the
queue(s) in the spec contain tasks, but no running workers are detected.
This is to protect against a failed worker launch.

.. code:: bash

Expand All @@ -129,11 +136,6 @@ for workers. The default is 60 seconds.

The only currently available option for ``--task_server`` is celery, which is the default when this flag is excluded.

The ``monitor`` function will check for celery workers for up to
10*(sleep) seconds before monitoring begins. The loop happens when the
queue(s) in the spec contain tasks, but no running workers are detected.
This is to protect against a failed worker launch.


Purging Tasks (``merlin purge``)
--------------------------------
Expand Down
14 changes: 13 additions & 1 deletion docs/source/merlin_specification.rst
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,9 @@ see :doc:`./merlin_variables`.
# The $(LAUNCHER) macro can be used to substitute a parallel launcher
# based on the batch:type:.
# It will use the nodes and procs values for the task.
# restart: The (optional) restart command to run when $(MERLIN_RESTART)
# is the exit code. The command in cmd will be run if the exit code
# is $(MERLIN_RETRY).
# task_queue: the queue to assign the step to (optional. default: merlin)
# shell: the shell to use for the command (eg /bin/bash /usr/bin/env python)
# (optional. default: /bin/bash)
Expand Down Expand Up @@ -156,6 +159,8 @@ see :doc:`./merlin_variables`.
cmd: |
cd $(runs1.workspace)/$(MERLIN_SAMPLE_PATH)
<post-process>
# exit $(MERLIN_RESTART) # syntax to send a restart error code
# This will rerun the cmd command. Users can also use $(MERLIN_RETRY).
nodes: 1
procs: 1
depends: [runs1]
Expand All @@ -167,7 +172,14 @@ see :doc:`./merlin_variables`.
cmd: |
touch learnrun.out
$(LAUNCHER) echo "$(VAR1) $(VAR2)" >> learnrun.out
exit $(MERLIN_RETRY) # some syntax to send a retry error code
exit $(MERLIN_RESTART) # syntax to send a restart error code
# exit $(MERLIN_RETRY) # syntax to send a retry error code to
# run the cmd command again instead of the restart command.
restart: |
# Command to run if the $(MERLIN_RESTART) exit code is used
touch learnrunrs.out
$(LAUNCHER) echo "$(VAR1) $(VAR2)" >> learnrunrs.out
exit $(MERLIN_SUCCESS) # syntax to send a success code
nodes: 1
procs: 1
task_queue: lqueue
Expand Down
4 changes: 2 additions & 2 deletions merlin/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down Expand Up @@ -38,7 +38,7 @@
import sys


__version__ = "1.11.0"
__version__ = "1.11.1"
VERSION = __version__
PATH_TO_PROJ = os.path.join(os.path.dirname(__file__), "")

Expand Down
2 changes: 1 addition & 1 deletion merlin/ascii_art.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/celery.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/common/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/common/abstracts/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/common/abstracts/enums/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/common/openfilelist.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/common/opennpylib.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/common/sample_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/common/sample_index_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/common/security/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/common/security/encrypt.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/common/security/encrypt_backend_traffic.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
4 changes: 2 additions & 2 deletions merlin/common/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down Expand Up @@ -645,7 +645,7 @@ def expand_tasks_with_samples( # pylint: disable=R0913,R0914
if not found_tasks:
for next_index_path, next_index in sample_index.traverse(conditional=condition):
LOG.info(
f"generating next step for range {next_index.min}:{next_index.max} {next_index.max-next_index.min}"
f"generating next step for range {next_index.min}:{next_index.max} {next_index.max - next_index.min}"
)
next_index.name = next_index_path

Expand Down
2 changes: 1 addition & 1 deletion merlin/common/util_sampling.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/config/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/config/broker.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/config/celeryconfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/config/configfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/config/results_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/config/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion merlin/data/celery/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.11.0.
# This file is part of Merlin, Version: 1.11.1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
Loading
Loading