Skip to content

Commit

Permalink
feature/revamped status (#464)
Browse files Browse the repository at this point in the history
* feature/new-status (#442)

* add backend functionality for merlin status

* add frontend functionality for merlin status

* add tests for merlin status

* run fix-style and remove import of deprecated function

* update CHANGELOG

* add more logging statements, make better use of glob

* run fix-style

* clean up test files a bit

* fix test suite after step_name_map mod

* add avg/std dev run time calculations to status

* modify status tests to accommodate new avg/std dev calculations

* fix linter issues

* fix lint issue and add test for avg/std dev calc

* feature/detailed-status (#451)

* Version/1.11.0 (#449)

* fix default worker bug with all steps

* version bump and requirements fix

* Bugfix/filename-special-vars (#425)

* fix file naming bug

* fix filename bug with variable as study name

* add tests for the file name special vars changes

* modify changelog

* implement Luc's suggestions

* remove replace line

* Create dependabot-changelog-updater.yml

* testing outputs of modifying changelog

* delete dependabot-changelog-updater

* feature/pdf-docs (#427)

* first attempt at adding pdf

* fixing build error

* modify changelog to show docs changes

* fix errors Luc found in the build logs

* trying out removal of latex

* reverting latex changes back

* uncommenting the latex_elements settings

* adding epub to see if latex will build

* adding a latex engine variable to conf

* fix naming error with latex_engine

* attempting to add a logo to the pdf build

* testing an override to the searchtools file

* revert back to not using searchtools override

* update changelog

* bugfix/openfoam_singularity_issues (#426)

* fix openfoam_singularity issues

* update requirements and descriptions for openfoam examples

* bugfix/output-path-substitution (#430)

* fix bug with output_path and variable substitution

* add tests for cli substitutions

* bugfix/scheduler-permission-error (#436)

* Release/1.10.2 (#437)

* bump version to 1.10.2

* bump version in CHANGELOG

* resolve develop to main merge issues (#439)

* fix default worker bug with all steps

* version bump and requirements fix

* dependabot/certifi-requests-pygments (#441)

* Bump certifi from 2022.12.7 to 2023.7.22 in /docs

Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.12.7 to 2023.7.22.
- [Commits](certifi/python-certifi@2022.12.07...2023.07.22)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* add all dependabot changes and update CHANGELOG

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* bugfix/server-pip-redis-conf (#443)

* add *.conf to the MANIFEST file so pip will grab the redis.conf file

* add note explaining how to fix a hanging merlin server start

* modify CHANGELOG

* add second export option to docs and fix typo

* bump to version 1.10.3 (#444)

* bugfix/sphinx-5.3.0-requirement (#446)

* Version/1.10.3 (#445)

* fix default worker bug with all steps

* version bump and requirements fix

* Bugfix/filename-special-vars (#425)

* fix file naming bug

* fix filename bug with variable as study name

* add tests for the file name special vars changes

* modify changelog

* implement Luc's suggestions

* remove replace line

* Create dependabot-changelog-updater.yml

* testing outputs of modifying changelog

* delete dependabot-changelog-updater

* feature/pdf-docs (#427)

* first attempt at adding pdf

* fixing build error

* modify changelog to show docs changes

* fix errors Luc found in the build logs

* trying out removal of latex

* reverting latex changes back

* uncommenting the latex_elements settings

* adding epub to see if latex will build

* adding a latex engine variable to conf

* fix naming error with latex_engine

* attempting to add a logo to the pdf build

* testing an override to the searchtools file

* revert back to not using searchtools override

* update changelog

* bugfix/openfoam_singularity_issues (#426)

* fix openfoam_singularity issues

* update requirements and descriptions for openfoam examples

* bugfix/output-path-substitution (#430)

* fix bug with output_path and variable substitution

* add tests for cli substitutions

* bugfix/scheduler-permission-error (#436)

* Release/1.10.2 (#437)

* bump version to 1.10.2

* bump version in CHANGELOG

* resolve develop to main merge issues (#439)

* fix default worker bug with all steps

* version bump and requirements fix

* dependabot/certifi-requests-pygments (#441)

* Bump certifi from 2022.12.7 to 2023.7.22 in /docs

Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.12.7 to 2023.7.22.
- [Commits](certifi/python-certifi@2022.12.07...2023.07.22)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* add all dependabot changes and update CHANGELOG

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* bugfix/server-pip-redis-conf (#443)

* add *.conf to the MANIFEST file so pip will grab the redis.conf file

* add note explaining how to fix a hanging merlin server start

* modify CHANGELOG

* add second export option to docs and fix typo

* bump to version 1.10.3 (#444)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* change hardcoded sphinx requirement

* update CHANGELOG

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feature/vlauncher (#447)

* fix file naming error for iterative workflows

* fixed small bug with new filepath naming

* add VLAUNCHER functionality

* add docs for VLAUNCHER and modify changelog

* re-word docs and fix table format

* add a test for vlauncher

* run fix-style and add a test for vlauncher

* Add the find_vlaunch_var and setup_vlaunch functions.
The numeric value of the shell variables may not be defined until run
time, so replace with variable strings instead of values.
Consolidate the commands into one function.

* Add variable set for (t)csh.

* Run fix-style

* make step settings the defaults and ignore commented lines

* add some additional tests

* remove regex library import

---------

Co-authored-by: Joseph M. Koning <koning1@llnl.gov>

* release/1.11.0 (#448)

* bugfix/skewed-sample-hierarchy (#450)

* add patch for skewed sample hierarchy/additional samples

* update changelog

* catch narrower range of exceptions

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Joseph M. Koning <koning1@llnl.gov>

* add functionality for the detailed-status command

* add tests for detailed-status

* fix linter issues

* update changelog

* general cleanup and add log statements

* slightly modify two tests

* default status renderer now uses json status format

* remove inaccurate comment

* bugfix/lsf-gpu-typo (#453)

* fix typo in batch.py that causes a bug

* change print statements to log statements

* release/1.11.1 (#454)

* Add Pytest Fixtures to Test Suite (#456)

* begin work on integration refactor; create fixtures and initial tests

* update CHANGELOG and run fix-style

* add pytest fixtures and README explaining them

* add tests to demonstrate how to use the fixtures

* move/rename some files and modify integration's README

* add password change to redis.pass file

* fix lint issues

* modify redis pwd for test server to be constant for each test

* fix lint issue only caught on github ci

* Bugfix for WEAVE CI (#457)

* begin work on integration refactor; create fixtures and initial tests

* update CHANGELOG and run fix-style

* add pytest fixtures and README explaining them

* add tests to demonstrate how to use the fixtures

* move/rename some files and modify integration's README

* add password change to redis.pass file

* fix lint issues

* modify redis pwd for test server to be constant for each test

* fix lint issue only caught on github ci

* add fix for merlin server startup

* update CHANGELOG

* bugfix/monitor-shutdown (#452)

* add celery query to see if workers still processing tasks

* fix merlin status when using redis as broker

* fix consumer count bug and run fix-style

* fix linter issues

* update changelog

* update docs for monitor

* remove unused exception I previously added

* first attempt at using pytest fixtures for monitor tests

* (partially) fix launch_workers fixture so it can be used in multiple classes

* fix linter issues and typo on pytest decorator

* update black's python version and fix style issue

* remove print statements from celeryadapter.py

* workers manager is now allowed to be used as a context manager

* add one thing to changelog and remove print statement

* Add the missing restart keyword to the specification docs. (#459)

* add Jeremy's suggestion to change vars option to output-path

* remove unnecessary lines from CHANGELOG

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Joseph M. Koning <koning1@llnl.gov>
Co-authored-by: Joe Koning <koning@users.noreply.github.com>

* feature/queue info (#461)

* remove a merge conflict statement that was missed

* add queue-info functionality

* add tests for queue-info

* update CHANGELOG

* add try/except for forceful termination of test workers

* change github workflow to use py38 with black instead of py36

* run fix-style with py 3.12 and fix a typo in a test

* add filetype check for dump option

* add banner print statement

* docs/revamped status (#462)

* fix broken image link in README

* add new commands to the command line page

* add monitoring docs layout and complete status cmds page

* fix bug with dumping queue-info to files

* add docs for queue-info

* add documentation for 'query-workers'

* add reference to new query-workers docs and split a paragraph

* fix small bug with --steps option of monitor

* add documentation for monitor command

* update CHANGELOG

* fix dump-csv image for queue-info

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Joseph M. Koning <koning1@llnl.gov>
Co-authored-by: Joe Koning <koning@users.noreply.github.com>
  • Loading branch information
4 people authored Feb 14, 2024
1 parent f36c58d commit 320d12f
Show file tree
Hide file tree
Showing 97 changed files with 9,103 additions and 252 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/push-pr_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,9 @@ jobs:
- name: Lint with Black
run: |
python3 -m black --check --line-length $MAX_LINE_LENGTH --target-version py36 merlin
python3 -m black --check --line-length $MAX_LINE_LENGTH --target-version py36 tests
python3 -m black --check --line-length $MAX_LINE_LENGTH --target-version py36 *.py
python3 -m black --check --line-length $MAX_LINE_LENGTH --target-version py38 merlin
python3 -m black --check --line-length $MAX_LINE_LENGTH --target-version py38 tests
python3 -m black --check --line-length $MAX_LINE_LENGTH --target-version py38 *.py
- name: Lint with PyLint
run: |
Expand Down
59 changes: 54 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,60 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]
### Added
- A new command `merlin queue-info` that will print the status of your celery queues
- By default this will only pull information from active queues
- There are options to look for specific queues (`--specific-queues`), queues defined in certain spec files (`--spec`; this is the same functionality as the `merlin status` command prior to this update), and queues attached to certain steps (`--steps`)
- Queue info can be dumped to outfiles with `--dump`
- A new command `merlin detailed-status` that displays task-by-task status information about your study
- This has options to filter by return code, task queues, task statuses, and workers
- You can set a limit on the number of tasks to display
- There are 3 options to modify the output display
- Docs for all of the monitoring commands
- New file `merlin/study/status.py` dedicated to work relating to the status command
- Contains the Status and DetailedStatus classes
- New file `merlin/study/status_renderers.py` dedicated to formatting the output for the detailed-status command
- New file `merlin/common/dumper.py` containing a Dumper object to help dump output to outfiles
- Study name and parameter info now stored in the DAG and MerlinStep objects
- Added functions to `merlin/display.py` that help display status information:
- `display_task_by_task_status` handles the display for the `merlin detailed-status` command
- `display_status_summary` handles the display for the `merlin status` command
- `display_progress_bar` generates and displays a progress bar
- Added new methods to the MerlinSpec class:
- get_worker_step_map()
- get_queue_step_relationship()
- get_tasks_per_step()
- get_step_param_map()
- Added methods to the MerlinStepRecord class to mark status changes for tasks as they run (follows Maestro's StepRecord format mostly)
- Added methods to the Step class:
- establish_params()
- name_no_params()
- Added a property paramater_labels to the MerlinStudy class
- Added two new utility functions:
- dict_deep_merge() that deep merges two dicts into one
- ws_time_to_dt() that converts a workspace timestring (YYYYMMDD-HHMMSS) to a datetime object
- A new celery task `condense_status_files` to be called when sets of samples finish
- Added a celery config setting `worker_cancel_long_running_tasks_on_connection_loss` since this functionality is about to change in the next version of celery
- Tests for the Status and DetailedStatus classes
- this required adding a decent amount of test files to help with the tests; these can be found under the tests/unit/study/status_test_files directory
- Pytest fixtures in the `conftest.py` file of the integration test suite
- NOTE: an export command `export LC_ALL='C'` had to be added to fix a bug in the WEAVE CI. This can be removed when we resolve this issue for the `merlin server` command
- Tests for the `celeryadapter.py` module
- New CeleryTestWorkersManager context to help with starting/stopping workers for tests

### Fixed
- The `merlin status` command so that it's consistent in its output whether using redis or rabbitmq as the broker
- The `merlin monitor` command will now keep an allocation up if the queues are empty and workers are still processing tasks
- Add the restart keyword to the specification docs

### Changed
- Reformatted the entire `merlin status` command
- Now accepts both spec files and workspace directories as arguments
- Removed the --steps flag
- Replaced the --csv flag with the --dump flag
- New functionality:
- Shows step_by_step progress bar for tasks
- Displays a summary of task statuses below the progress bar
- Split the `add_chains_to_chord` function in `merlin/common/tasks.py` into two functions:
- `get_1d_chain` which converts a 2D list of chains into a 1D list
- `launch_chain` which launches the 1D chain
- Pulled the needs_merlin_expansion() method out of the Step class and made it a function instead
- Removed `tabulate_info` function; replaced with tabulate from the tabulate library
- Moved `verify_filepath` and `verify_dirpath` from `merlin/main.py` to `merlin/utils.py`
- The entire documentation has been ported to MkDocs and re-organized
- *Dark Mode*
- New "Getting Started" example for a simple setup tutorial
Expand All @@ -32,6 +75,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- New "Contact" page with info on reaching Merlin devs
- The Merlin tutorial defaults to using Singularity rather than Docker for the OpenFoam example. Minor tutorial fixes have also been applied.

### Fixed
- The `merlin status` command so that it's consistent in its output whether using redis or rabbitmq as the broker
- The `merlin monitor` command will now keep an allocation up if the queues are empty and workers are still processing tasks
- Add the restart keyword to the specification docs
- Cyclical imports and config imports that could easily cause ci issues

## [1.11.1]
### Fixed
- Typo in `batch.py` that caused lsf launches to fail (`ALL_SGPUS` changed to `ALL_GPUS`)
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ HPC batch systems, since it can scale to a very large number of jobs.

The integrated system looks a little something like this:

<img src="docs/images/merlin_arch.png" alt="a typical Merlin workflow">
![A Typical Merlin Workflow](docs/assets/images/merlin_arch.png)

In this example, here's how it all works:

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 320d12f

Please sign in to comment.