Skip to content

Commit

Permalink
1.8.0 (#323)
Browse files Browse the repository at this point in the history
* Squash and merge develop to Main

* Master (#2)

* 1.7.0 (#257)

* locked maestro version

* incremented to 1.5.3

* Update CHANGELOG.md

* Feature/koning/vhost no leading slash (#217)

* The broker name can now be amqps (with ssl) or amqp (without ssl).
The default rabbitmq vhost is now <user> instead of /<user>.

* Undo fix typos.

* Fix spacing.

* Add amqps conn for self doc.

* Add inital running workers check to monitor. This should eliminate any (#218)

race conditions.

* updated timeout (#223)

* updated default task timeout

* Added to CHANGELOG

* Feature/koning/create encrypt key init (#219)

* Add init_key to create the encryption key when merlin config is run.

* Move import to point of use for encryption.

* Feature/koning/monitor upgrade (#221)

* Fix sleep variable.

* Change another sleep variable.

* Change brackets.

* Fix clarity for jobs but no consumers condition.

* Fix comment.

* Update comment.

* Move check_merlin_status to router.

* Run fix-style.

* Bugfix/ben/expand name (#220)

* working on bugfix

* achieved desired behavior

* made expansion anf filewriting conditional

* fixed style

* updated CHANGELOG

* fixed samples and restart

* fixed style

* used variable

* added name expansion test

* temporarily commented out one test

* fixed style

* added debug block

* changes

* corrected bug

* updated example; scripts are now copied into merlin_info

* Feature/ben/expose visibility timeout (#224)

* exposed visibility timeout in config file

* added visibility timeouts to default config files

* fixed style

* added seconds to clarify name

* fixed attribute name

* locked celery version at 4.4.2

* updated CHANGELOG

* v1.5.4

* adjusted CHANGELOG

* adjusted CHANGELOG

* Feature/koning/info conn timeout (#226)

* Add a timeout check for the merlin info kombu connection test. The redis
server will not comply with the connection_timeout config.

* Update docs.

* Load default timeout if non in config file (#229)

* 1.6.1

* bugfix (#232)

* bugfix

* removed repetition in logic

* Log sample generation to merlin_info (#234)

* sample generation is now logged to merlin_info

* Added to CHANGELOG

* fixed style

* Maestro 1.1.8+ update (#233)

* working on adding maestro's schema validation to merlin

* updated schemas

* adding merlin_schema to spec logic

* added debug prints

* added print

* fixing vestigial Maestro calls

* adjusted merlin_schemas

* fixed json file

* validation progress

* added from_dict replacement logic

* adjusted maestrowf version req

* fixed string

* schema improvements

* merlinsection schema updates

* merlin section error parsing appears to work

* updated CHANGELOG

* fixed merge error

* Added walltime to json schema

* Add the bank and walltime keywords to the batch slurm launch, these (#236)

will not alter the lsf launch.

* Bugfix for new celery versions (#231)

* Change the expand_tasks_with_samples task to use an immutable signature,
no return code is neccessary.

* Change only expand_tasks_with_samples to si()

* Increase celery version requirement to the latest (4.4.5).

* Update CHANGELOG.

Co-authored-by: Benjamin Bay <48391872+ben-bay@users.noreply.github.com>

* hidden test specs (#237)

* added tests to run_tests.py

* updated merlinsection schema

* added full_spec test

* added to CHANGELOG

* merlinspec change

* use regex to find variable tokens (#239)

* now using regex to find variable tokens

* added shell ref function

* added to CHANGELOG

* Revert maestro 1.1.8 support (#241)

* reverted support of Maestro 1.1.8

* removed tests that use maestro validation

* fixed style

* fixed cli tests, reactivated 3 old ones (#242)

* fixed cli tests, reactivated 3 old ones

* added to CHANGELOG

* More provenance specs + logic improvements (#240)

* added orig.yaml and expanded.yaml

* added partial provenance spec

* fixed style

* added to CHANGELOG

* removed comments

* made env replacement more precise

* fixed restart bug

* changed provenance file names

* logic update

* fixed style

* expansion now working

* added cli test logic to test all 3 provenance file types

* updated CHANGELOG

* expansion sans io (#243)

* reverted support of Maestro 1.1.8

* removed tests that use maestro validation

* fixed style

* added sections property to spec

* reducing number of file writes and reads

* provenance specs are correct

* fixed style

* removed unused imports

* cleaning up logic

* fixed style

* final logic improvements

* removed unused property

* updated CHANGELOG

* v1.6.2

* debugging study

* added provenance tests

* fixed path in test

* added check for workflow name expanding to invalid filename

* Docs/ben/level max dirs (#250)

* added to documentation

* added to CHANGELOG

* added default

* updated docs to reflect recent provenance spec update (#251)

* Feature/ben/pgen (#248)

* added basic pgen support

* added to CHANGELOG

* pgen, pargs, and env appear to work

* added to CHANGELOG

* fixed typo

* added pgen cli test

* added faq entry on pgen

* Update Dockerfile (#254)

* Update CHANGELOG.md

* pretty yaml (#252)

* starting

* building pretty_dump function

* updates

* fixed style

* tweaks

* valid yaml

* added yaml_sections

* added to CHANGELOG

* removed old yaml representer code

* added cli test for equality of provenance specs

* fixed pgen test

* disabled test while not working

* fixed yaml lists for source, paths, git

* tweaks / fixes

* env variable bugfix (#247)

* removed expansion of env variables in provenance specs

* added to CHANGELOG

* corrected unit tests

* added todos

* working on expand_env_var function

* fixed type bug

* removed comments, prints, fixed style

* added to CHANGELOG

* tweaks

* fixed output path

* allowed study name to use env vars

* fixed style

* added to CHANGELOG, added support for 'restart'

* Update CHANGELOG.md

* added docs strings

* added to docs

* updated comment

* fixed CHANGELOG

* removed debug print

* flux examples fix (#253)

* Update flux example workflows to use new kvs query methods.

* Update CHANGELOG with fix info.

* Add coker fix comment.

* Update CHANGELOG.md

* added gitlab ci file

* improved gitlab ci setup

* renamed file

* hotfix to allow distributed cli tests to run

* redis update (#255)

* Update redis to new version with TLS support.

* Add changelog.

* Bugfix for rediss cert_reqs keyword. (#256)

* expose celery (#245)

* added default app.yaml, still in progress...

* added celery defaults

* Update app.yaml

* config override appears to be working

* corrected value name

* shifted some properties to be optional defaults

* removed unneeded file

* improvements

* fixed style

* fixed style

* hid 'no override' message

* tweaked var names

* added 2 tests for celery app (#246)

* fixed test

* fixed workers bug

* fixed unit tests for running offline

* removed 2 unit tests

* fixed CHANGELOG

* added info print of all celery configurations being overridden

* 1.7.0

* 1.7.0

Co-authored-by: Joe Koning <koning@users.noreply.github.com>
Co-authored-by: Luc Peterson <peterson76@llnl.gov>
Co-authored-by: fixdocker <68359629+fixdocker@users.noreply.github.com>

* 1.7.1 hotfix (#261)

* locked maestro version

* incremented to 1.5.3

* Update CHANGELOG.md

* Feature/koning/vhost no leading slash (#217)

* The broker name can now be amqps (with ssl) or amqp (without ssl).
The default rabbitmq vhost is now <user> instead of /<user>.

* Undo fix typos.

* Fix spacing.

* Add amqps conn for self doc.

* Add inital running workers check to monitor. This should eliminate any (#218)

race conditions.

* updated timeout (#223)

* updated default task timeout

* Added to CHANGELOG

* Feature/koning/create encrypt key init (#219)

* Add init_key to create the encryption key when merlin config is run.

* Move import to point of use for encryption.

* Feature/koning/monitor upgrade (#221)

* Fix sleep variable.

* Change another sleep variable.

* Change brackets.

* Fix clarity for jobs but no consumers condition.

* Fix comment.

* Update comment.

* Move check_merlin_status to router.

* Run fix-style.

* Bugfix/ben/expand name (#220)

* working on bugfix

* achieved desired behavior

* made expansion anf filewriting conditional

* fixed style

* updated CHANGELOG

* fixed samples and restart

* fixed style

* used variable

* added name expansion test

* temporarily commented out one test

* fixed style

* added debug block

* changes

* corrected bug

* updated example; scripts are now copied into merlin_info

* Feature/ben/expose visibility timeout (#224)

* exposed visibility timeout in config file

* added visibility timeouts to default config files

* fixed style

* added seconds to clarify name

* fixed attribute name

* locked celery version at 4.4.2

* updated CHANGELOG

* v1.5.4

* adjusted CHANGELOG

* adjusted CHANGELOG

* Feature/koning/info conn timeout (#226)

* Add a timeout check for the merlin info kombu connection test. The redis
server will not comply with the connection_timeout config.

* Update docs.

* Load default timeout if non in config file (#229)

* 1.6.1

* bugfix (#232)

* bugfix

* removed repetition in logic

* Log sample generation to merlin_info (#234)

* sample generation is now logged to merlin_info

* Added to CHANGELOG

* fixed style

* Maestro 1.1.8+ update (#233)

* working on adding maestro's schema validation to merlin

* updated schemas

* adding merlin_schema to spec logic

* added debug prints

* added print

* fixing vestigial Maestro calls

* adjusted merlin_schemas

* fixed json file

* validation progress

* added from_dict replacement logic

* adjusted maestrowf version req

* fixed string

* schema improvements

* merlinsection schema updates

* merlin section error parsing appears to work

* updated CHANGELOG

* fixed merge error

* Added walltime to json schema

* Add the bank and walltime keywords to the batch slurm launch, these (#236)

will not alter the lsf launch.

* Bugfix for new celery versions (#231)

* Change the expand_tasks_with_samples task to use an immutable signature,
no return code is neccessary.

* Change only expand_tasks_with_samples to si()

* Increase celery version requirement to the latest (4.4.5).

* Update CHANGELOG.

Co-authored-by: Benjamin Bay <48391872+ben-bay@users.noreply.github.com>

* hidden test specs (#237)

* added tests to run_tests.py

* updated merlinsection schema

* added full_spec test

* added to CHANGELOG

* merlinspec change

* use regex to find variable tokens (#239)

* now using regex to find variable tokens

* added shell ref function

* added to CHANGELOG

* Revert maestro 1.1.8 support (#241)

* reverted support of Maestro 1.1.8

* removed tests that use maestro validation

* fixed style

* fixed cli tests, reactivated 3 old ones (#242)

* fixed cli tests, reactivated 3 old ones

* added to CHANGELOG

* More provenance specs + logic improvements (#240)

* added orig.yaml and expanded.yaml

* added partial provenance spec

* fixed style

* added to CHANGELOG

* removed comments

* made env replacement more precise

* fixed restart bug

* changed provenance file names

* logic update

* fixed style

* expansion now working

* added cli test logic to test all 3 provenance file types

* updated CHANGELOG

* expansion sans io (#243)

* reverted support of Maestro 1.1.8

* removed tests that use maestro validation

* fixed style

* added sections property to spec

* reducing number of file writes and reads

* provenance specs are correct

* fixed style

* removed unused imports

* cleaning up logic

* fixed style

* final logic improvements

* removed unused property

* updated CHANGELOG

* v1.6.2

* debugging study

* added provenance tests

* fixed path in test

* added check for workflow name expanding to invalid filename

* Docs/ben/level max dirs (#250)

* added to documentation

* added to CHANGELOG

* added default

* updated docs to reflect recent provenance spec update (#251)

* Feature/ben/pgen (#248)

* added basic pgen support

* added to CHANGELOG

* pgen, pargs, and env appear to work

* added to CHANGELOG

* fixed typo

* added pgen cli test

* added faq entry on pgen

* Update Dockerfile (#254)

* Update CHANGELOG.md

* pretty yaml (#252)

* starting

* building pretty_dump function

* updates

* fixed style

* tweaks

* valid yaml

* added yaml_sections

* added to CHANGELOG

* removed old yaml representer code

* added cli test for equality of provenance specs

* fixed pgen test

* disabled test while not working

* fixed yaml lists for source, paths, git

* tweaks / fixes

* env variable bugfix (#247)

* removed expansion of env variables in provenance specs

* added to CHANGELOG

* corrected unit tests

* added todos

* working on expand_env_var function

* fixed type bug

* removed comments, prints, fixed style

* added to CHANGELOG

* tweaks

* fixed output path

* allowed study name to use env vars

* fixed style

* added to CHANGELOG, added support for 'restart'

* Update CHANGELOG.md

* added docs strings

* added to docs

* updated comment

* fixed CHANGELOG

* removed debug print

* flux examples fix (#253)

* Update flux example workflows to use new kvs query methods.

* Update CHANGELOG with fix info.

* Add coker fix comment.

* Update CHANGELOG.md

* added gitlab ci file

* improved gitlab ci setup

* renamed file

* hotfix to allow distributed cli tests to run

* redis update (#255)

* Update redis to new version with TLS support.

* Add changelog.

* Bugfix for rediss cert_reqs keyword. (#256)

* expose celery (#245)

* added default app.yaml, still in progress...

* added celery defaults

* Update app.yaml

* config override appears to be working

* corrected value name

* shifted some properties to be optional defaults

* removed unneeded file

* improvements

* fixed style

* fixed style

* hid 'no override' message

* tweaked var names

* added 2 tests for celery app (#246)

* fixed test

* fixed workers bug

* fixed unit tests for running offline

* removed 2 unit tests

* fixed CHANGELOG

* added info print of all celery configurations being overridden

* 1.7.0

* 1.7.0

* copy samplesfile to merlin_info (#259)

* copy samplesfile to merlin_info

* added to CHANGELOG

* Bugfix/koning/process exceptions (#260)

* Add bugfix log.

* Add an exception catcher for the multiprocessing Process class.

* fixed style

* 1.7.1

Co-authored-by: Joe Koning <koning@users.noreply.github.com>
Co-authored-by: Luc Peterson <peterson76@llnl.gov>
Co-authored-by: fixdocker <68359629+fixdocker@users.noreply.github.com>

* 1.7.2 (#264)

* locked maestro version

* incremented to 1.5.3

* Update CHANGELOG.md

* Feature/koning/vhost no leading slash (#217)

* The broker name can now be amqps (with ssl) or amqp (without ssl).
The default rabbitmq vhost is now <user> instead of /<user>.

* Undo fix typos.

* Fix spacing.

* Add amqps conn for self doc.

* Add inital running workers check to monitor. This should eliminate any (#218)

race conditions.

* updated timeout (#223)

* updated default task timeout

* Added to CHANGELOG

* Feature/koning/create encrypt key init (#219)

* Add init_key to create the encryption key when merlin config is run.

* Move import to point of use for encryption.

* Feature/koning/monitor upgrade (#221)

* Fix sleep variable.

* Change another sleep variable.

* Change brackets.

* Fix clarity for jobs but no consumers condition.

* Fix comment.

* Update comment.

* Move check_merlin_status to router.

* Run fix-style.

* Bugfix/ben/expand name (#220)

* working on bugfix

* achieved desired behavior

* made expansion anf filewriting conditional

* fixed style

* updated CHANGELOG

* fixed samples and restart

* fixed style

* used variable

* added name expansion test

* temporarily commented out one test

* fixed style

* added debug block

* changes

* corrected bug

* updated example; scripts are now copied into merlin_info

* Feature/ben/expose visibility timeout (#224)

* exposed visibility timeout in config file

* added visibility timeouts to default config files

* fixed style

* added seconds to clarify name

* fixed attribute name

* locked celery version at 4.4.2

* updated CHANGELOG

* v1.5.4

* adjusted CHANGELOG

* adjusted CHANGELOG

* Feature/koning/info conn timeout (#226)

* Add a timeout check for the merlin info kombu connection test. The redis
server will not comply with the connection_timeout config.

* Update docs.

* Load default timeout if non in config file (#229)

* 1.6.1

* bugfix (#232)

* bugfix

* removed repetition in logic

* Log sample generation to merlin_info (#234)

* sample generation is now logged to merlin_info

* Added to CHANGELOG

* fixed style

* Maestro 1.1.8+ update (#233)

* working on adding maestro's schema validation to merlin

* updated schemas

* adding merlin_schema to spec logic

* added debug prints

* added print

* fixing vestigial Maestro calls

* adjusted merlin_schemas

* fixed json file

* validation progress

* added from_dict replacement logic

* adjusted maestrowf version req

* fixed string

* schema improvements

* merlinsection schema updates

* merlin section error parsing appears to work

* updated CHANGELOG

* fixed merge error

* Added walltime to json schema

* Add the bank and walltime keywords to the batch slurm launch, these (#236)

will not alter the lsf launch.

* Bugfix for new celery versions (#231)

* Change the expand_tasks_with_samples task to use an immutable signature,
no return code is neccessary.

* Change only expand_tasks_with_samples to si()

* Increase celery version requirement to the latest (4.4.5).

* Update CHANGELOG.

Co-authored-by: Benjamin Bay <48391872+ben-bay@users.noreply.github.com>

* hidden test specs (#237)

* added tests to run_tests.py

* updated merlinsection schema

* added full_spec test

* added to CHANGELOG

* merlinspec change

* use regex to find variable tokens (#239)

* now using regex to find variable tokens

* added shell ref function

* added to CHANGELOG

* Revert maestro 1.1.8 support (#241)

* reverted support of Maestro 1.1.8

* removed tests that use maestro validation

* fixed style

* fixed cli tests, reactivated 3 old ones (#242)

* fixed cli tests, reactivated 3 old ones

* added to CHANGELOG

* More provenance specs + logic improvements (#240)

* added orig.yaml and expanded.yaml

* added partial provenance spec

* fixed style

* added to CHANGELOG

* removed comments

* made env replacement more precise

* fixed restart bug

* changed provenance file names

* logic update

* fixed style

* expansion now working

* added cli test logic to test all 3 provenance file types

* updated CHANGELOG

* expansion sans io (#243)

* reverted support of Maestro 1.1.8

* removed tests that use maestro validation

* fixed style

* added sections property to spec

* reducing number of file writes and reads

* provenance specs are correct

* fixed style

* removed unused imports

* cleaning up logic

* fixed style

* final logic improvements

* removed unused property

* updated CHANGELOG

* v1.6.2

* debugging study

* added provenance tests

* fixed path in test

* added check for workflow name expanding to invalid filename

* Docs/ben/level max dirs (#250)

* added to documentation

* added to CHANGELOG

* added default

* updated docs to reflect recent provenance spec update (#251)

* Feature/ben/pgen (#248)

* added basic pgen support

* added to CHANGELOG

* pgen, pargs, and env appear to work

* added to CHANGELOG

* fixed typo

* added pgen cli test

* added faq entry on pgen

* Update Dockerfile (#254)

* Update CHANGELOG.md

* pretty yaml (#252)

* starting

* building pretty_dump function

* updates

* fixed style

* tweaks

* valid yaml

* added yaml_sections

* added to CHANGELOG

* removed old yaml representer code

* added cli test for equality of provenance specs

* fixed pgen test

* disabled test while not working

* fixed yaml lists for source, paths, git

* tweaks / fixes

* env variable bugfix (#247)

* removed expansion of env variables in provenance specs

* added to CHANGELOG

* corrected unit tests

* added todos

* working on expand_env_var function

* fixed type bug

* removed comments, prints, fixed style

* added to CHANGELOG

* tweaks

* fixed output path

* allowed study name to use env vars

* fixed style

* added to CHANGELOG, added support for 'restart'

* Update CHANGELOG.md

* added docs strings

* added to docs

* updated comment

* fixed CHANGELOG

* removed debug print

* flux examples fix (#253)

* Update flux example workflows to use new kvs query methods.

* Update CHANGELOG with fix info.

* Add coker fix comment.

* Update CHANGELOG.md

* added gitlab ci file

* improved gitlab ci setup

* renamed file

* hotfix to allow distributed cli tests to run

* redis update (#255)

* Update redis to new version with TLS support.

* Add changelog.

* Bugfix for rediss cert_reqs keyword. (#256)

* expose celery (#245)

* added default app.yaml, still in progress...

* added celery defaults

* Update app.yaml

* config override appears to be working

* corrected value name

* shifted some properties to be optional defaults

* removed unneeded file

* improvements

* fixed style

* fixed style

* hid 'no override' message

* tweaked var names

* added 2 tests for celery app (#246)

* fixed test

* fixed workers bug

* fixed unit tests for running offline

* removed 2 unit tests

* fixed CHANGELOG

* added info print of all celery configurations being overridden

* 1.7.0

* 1.7.0

* copy samplesfile to merlin_info (#259)

* copy samplesfile to merlin_info

* added to CHANGELOG

* Bugfix/koning/process exceptions (#260)

* Add bugfix log.

* Add an exception catcher for the multiprocessing Process class.

* fixed style

* 1.7.1

* hotfix

* fix

* 1.7.2

Co-authored-by: Joe Koning <koning@users.noreply.github.com>
Co-authored-by: Luc Peterson <peterson76@llnl.gov>
Co-authored-by: fixdocker <68359629+fixdocker@users.noreply.github.com>

* 1.7.2 (#265)

* locked maestro version

* incremented to 1.5.3

* Update CHANGELOG.md

* Feature/koning/vhost no leading slash (#217)

* The broker name can now be amqps (with ssl) or amqp (without ssl).
The default rabbitmq vhost is now <user> instead of /<user>.

* Undo fix typos.

* Fix spacing.

* Add amqps conn for self doc.

* Add inital running workers check to monitor. This should eliminate any (#218)

race conditions.

* updated timeout (#223)

* updated default task timeout

* Added to CHANGELOG

* Feature/koning/create encrypt key init (#219)

* Add init_key to create the encryption key when merlin config is run.

* Move import to point of use for encryption.

* Feature/koning/monitor upgrade (#221)

* Fix sleep variable.

* Change another sleep variable.

* Change brackets.

* Fix clarity for jobs but no consumers condition.

* Fix comment.

* Update comment.

* Move check_merlin_status to router.

* Run fix-style.

* Bugfix/ben/expand name (#220)

* working on bugfix

* achieved desired behavior

* made expansion anf filewriting conditional

* fixed style

* updated CHANGELOG

* fixed samples and restart

* fixed style

* used variable

* added name expansion test

* temporarily commented out one test

* fixed style

* added debug block

* changes

* corrected bug

* updated example; scripts are now copied into merlin_info

* Feature/ben/expose visibility timeout (#224)

* exposed visibility timeout in config file

* added visibility timeouts to default config files

* fixed style

* added seconds to clarify name

* fixed attribute name

* locked celery version at 4.4.2

* updated CHANGELOG

* v1.5.4

* adjusted CHANGELOG

* adjusted CHANGELOG

* Feature/koning/info conn timeout (#226)

* Add a timeout check for the merlin info kombu connection test. The redis
server will not comply with the connection_timeout config.

* Update docs.

* Load default timeout if non in config file (#229)

* 1.6.1

* bugfix (#232)

* bugfix

* removed repetition in logic

* Log sample generation to merlin_info (#234)

* sample generation is now logged to merlin_info

* Added to CHANGELOG

* fixed style

* Maestro 1.1.8+ update (#233)

* working on adding maestro's schema validation to merlin

* updated schemas

* adding merlin_schema to spec logic

* added debug prints

* added print

* fixing vestigial Maestro calls

* adjusted merlin_schemas

* fixed json file

* validation progress

* added from_dict replacement logic

* adjusted maestrowf version req

* fixed string

* schema improvements

* merlinsection schema updates

* merlin section error parsing appears to work

* updated CHANGELOG

* fixed merge error

* Added walltime to json schema

* Add the bank and walltime keywords to the batch slurm launch, these (#236)

will not alter the lsf launch.

* Bugfix for new celery versions (#231)

* Change the expand_tasks_with_samples task to use an immutable signature,
no return code is neccessary.

* Change only expand_tasks_with_samples to si()

* Increase celery version requirement to the latest (4.4.5).

* Update CHANGELOG.

Co-authored-by: Benjamin Bay <48391872+ben-bay@users.noreply.github.com>

* hidden test specs (#237)

* added tests to run_tests.py

* updated merlinsection schema

* added full_spec test

* added to CHANGELOG

* merlinspec change

* use regex to find variable tokens (#239)

* now using regex to find variable tokens

* added shell ref function

* added to CHANGELOG

* Revert maestro 1.1.8 support (#241)

* reverted support of Maestro 1.1.8

* removed tests that use maestro validation

* fixed style

* fixed cli tests, reactivated 3 old ones (#242)

* fixed cli tests, reactivated 3 old ones

* added to CHANGELOG

* More provenance specs + logic improvements (#240)

* added orig.yaml and expanded.yaml

* added partial provenance spec

* fixed style

* added to CHANGELOG

* removed comments

* made env replacement more precise

* fixed restart bug

* changed provenance file names

* logic update

* fixed style

* expansion now working

* added cli test logic to test all 3 provenance file types

* updated CHANGELOG

* expansion sans io (#243)

* reverted support of Maestro 1.1.8

* removed tests that use maestro validation

* fixed style

* added sections property to spec

* reducing number of file writes and reads

* provenance specs are correct

* fixed style

* removed unused imports

* cleaning up logic

* fixed style

* final logic improvements

* removed unused property

* updated CHANGELOG

* v1.6.2

* debugging study

* added provenance tests

* fixed path in test

* added check for workflow name expanding to invalid filename

* Docs/ben/level max dirs (#250)

* added to documentation

* added to CHANGELOG

* added default

* updated docs to reflect recent provenance spec update (#251)

* Feature/ben/pgen (#248)

* added basic pgen support

* added to CHANGELOG

* pgen, pargs, and env appear to work

* added to CHANGELOG

* fixed typo

* added pgen cli test

* added faq entry on pgen

* Update Dockerfile (#254)

* Update CHANGELOG.md

* pretty yaml (#252)

* starting

* building pretty_dump function

* updates

* fixed style

* tweaks

* valid yaml

* added yaml_sections

* added to CHANGELOG

* removed old yaml representer code

* added cli test for equality of provenance specs

* fixed pgen test

* disabled test while not working

* fixed yaml lists for source, paths, git

* tweaks / fixes

* env variable bugfix (#247)

* removed expansion of env variables in provenance specs

* added to CHANGELOG

* corrected unit tests

* added todos

* working on expand_env_var function

* fixed type bug

* removed comments, prints, fixed style

* added to CHANGELOG

* tweaks

* fixed output path

* allowed study name to use env vars

* fixed style

* added to CHANGELOG, added support for 'restart'

* Update CHANGELOG.md

* added docs strings

* added to docs

* updated comment

* fixed CHANGELOG

* removed debug print

* flux examples fix (#253)

* Update flux example workflows to use new kvs query methods.

* Update CHANGELOG with fix info.

* Add coker fix comment.

* Update CHANGELOG.md

* added gitlab ci file

* improved gitlab ci setup

* renamed file

* hotfix to allow distributed cli tests to run

* redis update (#255)

* Update redis to new version with TLS support.

* Add changelog.

* Bugfix for rediss cert_reqs keyword. (#256)

* expose celery (#245)

* added default app.yaml, still in progress...

* added celery defaults

* Update app.yaml

* config override appears to be working

* corrected value name

* shifted some properties to be optional defaults

* removed unneeded file

* improvements

* fixed style

* fixed style

* hid 'no override' message

* tweaked var names

* added 2 tests for celery app (#246)

* fixed test

* fixed workers bug

* fixed unit tests for running offline

* removed 2 unit tests

* fixed CHANGELOG

* added info print of all celery configurations being overridden

* 1.7.0

* 1.7.0

* copy samplesfile to merlin_info (#259)

* copy samplesfile to merlin_info

* added to CHANGELOG

* Bugfix/koning/process exceptions (#260)

* Add bugfix log.

* Add an exception catcher for the multiprocessing Process class.

* fixed style

* 1.7.1

* hotfix

* fix

* 1.7.2

* Update CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Joe Koning <koning@users.noreply.github.com>
Co-authored-by: Luc Peterson <peterson76@llnl.gov>
Co-authored-by: fixdocker <68359629+fixdocker@users.noreply.github.com>

* 1.7.3 (#266)

* locked maestro version

* incremented to 1.5.3

* Update CHANGELOG.md

* Feature/koning/vhost no leading slash (#217)

* The broker name can now be amqps (with ssl) or amqp (without ssl).
The default rabbitmq vhost is now <user> instead of /<user>.

* Undo fix typos.

* Fix spacing.

* Add amqps conn for self doc.

* Add inital running workers check to monitor. This should eliminate any (#218)

race conditions.

* updated timeout (#223)

* updated default task timeout

* Added to CHANGELOG

* Feature/koning/create encrypt key init (#219)

* Add init_key to create the encryption key when merlin config is run.

* Move import to point of use for encryption.

* Feature/koning/monitor upgrade (#221)

* Fix sleep variable.

* Change another sleep variable.

* Change brackets.

* Fix clarity for jobs but no consumers condition.

* Fix comment.

* Update comment.

* Move check_merlin_status to router.

* Run fix-style.

* Bugfix/ben/expand name (#220)

* working on bugfix

* achieved desired behavior

* made expansion anf filewriting conditional

* fixed style

* updated CHANGELOG

* fixed samples and restart

* fixed style

* used variable

* added name expansion test

* temporarily commented out one test

* fixed style

* added debug block

* changes

* corrected bug

* updated example; scripts are now copied into merlin_info

* Feature/ben/expose visibility timeout (#224)

* exposed visibility timeout in config file

* added visibility timeouts to default config files

* fixed style

* added seconds to clarify name

* fixed attribute name

* locked celery version at 4.4.2

* updated CHANGELOG

* v1.5.4

* adjusted CHANGELOG

* adjusted CHANGELOG

* Feature/koning/info conn timeout (#226)

* Add a timeout check for the merlin info kombu connection test. The redis
server will not comply with the connection_timeout config.

* Update docs.

* Load default timeout if non in config file (#229)

* 1.6.1

* bugfix (#232)

* bugfix

* removed repetition in logic

* Log sample generation to merlin_info (#234)

* sample generation is now logged to merlin_info

* Added to CHANGELOG

* fixed style

* Maestro 1.1.8+ update (#233)

* working on adding maestro's schema validation to merlin

* updated schemas

* adding merlin_schema to spec logic

* added debug prints

* added print

* fixing vestigial Maestro calls

* adjusted merlin_schemas

* fixed json file

* validation progress

* added from_dict replacement logic

* adjusted maestrowf version req

* fixed string

* schema improvements

* merlinsection schema updates

* merlin section error parsing appears to work

* updated CHANGELOG

* fixed merge error

* Added walltime to json schema

* Add the bank and walltime keywords to the batch slurm launch, these (#236)

will not alter the lsf launch.

* Bugfix for new celery versions (#231)

* Change the expand_tasks_with_samples task to use an immutable signature,
no return code is neccessary.

* Change only expand_tasks_with_samples to si()

* Increase celery version requirement to the latest (4.4.5).

* Update CHANGELOG.

Co-authored-by: Benjamin Bay <48391872+ben-bay@users.noreply.github.com>

* hidden test specs (#237)

* added tests to run_tests.py

* updated merlinsection schema

* added full_spec test

* added to CHANGELOG

* merlinspec change

* use regex to find variable tokens (#239)

* now using regex to find variable tokens

* added shell ref function

* added to CHANGELOG

* Revert maestro 1.1.8 support (#241)

* reverted support of Maestro 1.1.8

* removed tests that use maestro validation

* fixed style

* fixed cli tests, reactivated 3 old ones (#242)

* fixed cli tests, reactivated 3 old ones

* added to CHANGELOG

* More provenance specs + logic improvements (#240)

* added orig.yaml and expanded.yaml

* added partial provenance spec

* fixed style

* added to CHANGELOG

* removed comments

* made env replacement more precise

* fixed restart bug

* changed provenance file names

* logic update

* fixed style

* expansion now working

* added cli test logic to test all 3 provenance file types

* updated CHANGELOG

* expansion sans io (#243)

* reverted support of Maestro 1.1.8

* removed tests that use maestro validation

* fixed style

* added sections property to spec

* reducing number of file writes and reads

* provenance specs are correct

* fixed style

* removed unused imports

* cleaning up logic

* fixed style

* final logic improvements

* removed unused property

* updated CHANGELOG

* v1.6.2

* debugging study

* added provenance tests

* fixed path in test

* added check for workflow name expanding to invalid filename

* Docs/ben/level max dirs (#250)

* added to documentation

* added to CHANGELOG

* added default

* updated docs to reflect recent provenance spec update (#251)

* Feature/ben/pgen (#248)

* added basic pgen support

* added to CHANGELOG

* pgen, pargs, and env appear to work

* added to CHANGELOG

* fixed typo

* added pgen cli test

* added faq entry on pgen

* Update Dockerfile (#254)

* Update CHANGELOG.md

* pretty yaml (#252)

* starting

* building pretty_dump function

* updates

* fixed style

* tweaks

* valid yaml

* added yaml_sections

* added to CHANGELOG

* removed old yaml representer code

* added cli test for equality of provenance specs

* fixed pgen test

* disabled test while not working

* fixed yaml lists for source, paths, git

* tweaks / fixes

* env variable bugfix (#247)

* removed expansion of env variables in provenance specs

* added to CHANGELOG

* corrected unit tests

* added todos

* working on expand_env_var function

* fixed type bug

* removed comments, prints, fixed style

* added to CHANGELOG

* tweaks

* fixed output path

* allowed study name to use env vars

* fixed style

* added to CHANGELOG, added support for 'restart'

* Update CHANGELOG.md

* added docs strings

* added to docs

* updated comment

* fixed CHANGELOG

* removed debug print

* flux examples fix (#253)

* Update flux example workflows to use new kvs query methods.

* Update CHANGELOG with fix info.

* Add coker fix comment.

* Update CHANGELOG.md

* added gitlab ci file

* improved gitlab ci setup

* renamed file

* hotfix to allow distributed cli tests to run

* redis update (#255)

* Update redis to new version with TLS support.

* Add changelog.

* Bugfix for rediss cert_reqs keyword. (#256)

* expose celery (#245)

* added default app.yaml, still in progress...

* added celery defaults

* Update app.yaml

* config override appears to be working

* corrected value name

* shifted some properties to be optional defaults

* removed unneeded file

* improvements

* fixed style

* fixed style

* hid 'no override' message

* tweaked var names

* added 2 tests for celery app (#246)

* fixed test

* fixed workers bug

* fixed unit tests for running offline

* removed 2 unit tests

* fixed CHANGELOG

* added info print of all celery configurations being overridden

* 1.7.0

* 1.7.0

* copy samplesfile to merlin_info (#259)

* copy samplesfile to merlin_info

* added to CHANGELOG

* Bugfix/koning/process exceptions (#260)

* Add bugfix log.

* Add an exception catcher for the multiprocessing Process class.

* fixed style

* 1.7.1

* hotfix

* fix

* 1.7.2

* Update CHANGELOG.md

* Update CHANGELOG.md

* 1.7.3

* 1.7.3

Co-authored-by: Joe Koning <koning@users.noreply.github.com>
Co-authored-by: Luc Peterson <peterson76@llnl.gov>
Co-authored-by: fixdocker <68359629+fixdocker@users.noreply.github.com>

Co-authored-by: Benjamin Bay <48391872+ben-bay@users.noreply.github.com>
Co-authored-by: Joe Koning <koning@users.noreply.github.com>
Co-authored-by: fixdocker <68359629+fixdocker@users.noreply.github.com>

* Revert "Master (#2)"

This reverts commit 1d7495b.

* [Flake8 src refactor] display.py and broker.py refactored.

* [Standardize CI linting and Makefile] Updated the Makefile and CI to hold one standard for linting from setup.cfg.

* Add ability to enter step walltime in HH:MM:SS format.
With HH and MM optional. Previously, script adapters had
assumed a format that depends on the batch type. Now it'll get
converted internally to the right format.

* Updated CHANGELOG and docs

* Add testing for integers too

* An explicit conversion test. Making mypy and black happy.

* Move to 1.8.0

* Explictly wrap queue name string in quotes to make it more robust (#320)

* Explictly wrap queue name string in quotes to make it more robust

* Move CHANGELOG note to v1.8.0

* Update CHANGELOG.md

* Update test_time_formats.py

Utilized typing for all variables, utilizing PyTest parametrized marks to allow testing to flow regardless of individual fail states of test data.

* Update test_time_formats.py

Added test to verify some exception handling, compressed tests to not have as much indirection.

* Typo in variable name change

* Fix style before merging, update to Makefile so fix-style upgrades all dev packages before style check. (#321)

Co-authored-by: Alexander Cameron Winter <winter27@quartz1148.llnl.gov>

* Revert "Merge branch 'main' into develop" using -m option.

This reverts commit a9276b8, reversing
changes made to 63a094b.

Co-authored-by: Luc Peterson <peterson76@llnl.gov>
Co-authored-by: Joe Koning <koning@users.noreply.github.com>
Co-authored-by: fixdocker <68359629+fixdocker@users.noreply.github.com>
Co-authored-by: Alexander Cameron Winter <winter27@quartz1148.llnl.gov>
Co-authored-by: Alexander Winter <80929291+AlexanderWinterLLNL@users.noreply.github.com>
  • Loading branch information
6 people committed Jul 6, 2021
1 parent 00f13c4 commit eded0d5
Show file tree
Hide file tree
Showing 83 changed files with 1,507 additions and 1,094 deletions.
49 changes: 49 additions & 0 deletions .github/workflows/push-pr_workflow.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: Python CI

on: [push, pull_request]

jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.6, 3.7, 3.8, 3.9]

steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python3 -m pip install --upgrade pip
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
pip3 install -r requirements/dev.txt
- name: Install merlin to run unit tests
run: |
pip3 install -e .
merlin config
- name: Install CLI task dependencies generated from the 'feature demo' workflow
run: |
merlin example feature_demo
pip3 install -r feature_demo/requirements.txt
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=15 --statistics --max-line-length=88
- name: Run pytest over unit test suite
run: |
python3 -m pytest tests/unit/
- name: Run integration test suite, locally
run: |
python3 tests/integration/run_tests.py --verbose --local
17 changes: 0 additions & 17 deletions .travis.yml

This file was deleted.

35 changes: 35 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,41 @@ All notable changes to Merlin will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).



## [1.8.0]

### Added
- `retry_delay` field in a step to specify a countdown in seconds prior to running a
restart or retry.
- New merlin example `restart_delay` that demonstrates usage of this feature.
- Condition failure reporting, to give greater insight into what caused test failure.
- New fields in config file: `celery.omit_queue_tag` and `celery.queue_tag`, for
users who wish to have complete control over their queue names. This is a feature
of the task priority change.

### Changed
- `feature_demo` now uses `merlin-spellbook` instead of its own scripts.
- Removed the `--mpi=none` `srun` default launch argument. This can be added by
setting the `launch_args` argument in the batch section in the spec.
- Merlin CI is now handled by Github Actions.
- Certain tests and source code have been refactored to abide by Flake8 conventions.
- Reorganized the `tests` module. Made `unit` dir alongside `integration` dir. Decomposed
`run_tests.py` into 3 files with distinct responsibilities. Renamed `Condition` classes.
Grouped cli tests by sub-category for easier developer interpretation.
- Lowered the command line test log level to "ERROR" to reduce spam in `--verbose` mode.
- Now prioritizing workflow tasks over task-expansion tasks, enabling improved
scalability and server stability.
- Flake8 examination slightly modified for more generous cyclomatic complexity.
- Certain tests and source code have been refactored to abide by Flake8 conventions.
- `walltime` can be specified in any of hours:minutes:seconds, minutes:seconds or
seconds format and will be correctly translated for the right batch system syntax

### Fixed
- For Celery calls, explictly wrapped queue string in quotes for robustness. This fixes
a bug that occurred on tsch but not bash in which square brackets in the queue name were
misinterpreted and caused the command to break.

## [1.7.9]

### Fixed
Expand Down
33 changes: 11 additions & 22 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.7.9.
# This file is part of Merlin, Version: 1.8.0.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand All @@ -27,24 +27,7 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
###############################################################################

PYTHON?=python3
PYV=$(shell $(PYTHON) -c "import sys;t='{v[0]}_{v[1]}'.format(v=list(sys.version_info[:2]));sys.stdout.write(t)")
PYVD=$(shell $(PYTHON) -c "import sys;t='{v[0]}.{v[1]}'.format(v=list(sys.version_info[:2]));sys.stdout.write(t)")
VENV?=venv_merlin_py$(PYV)
PIP?=$(VENV)/bin/pip
MRLN=merlin
TEST=tests
DOCS=docs
WKFW=merlin/examples/workflows/
MAX_COMPLEXITY?=5

VER?=1.0.0
VSTRING=[0-9]\+\.[0-9]\+\.[0-9]\+
CHANGELOG_VSTRING="## \[$(VSTRING)\]"
INIT_VSTRING="__version__ = \"$(VSTRING)\""

PENV=merlin$(PYV)
include config.mk

.PHONY : all
.PHONY : install-dev
Expand All @@ -65,6 +48,7 @@ PENV=merlin$(PYV)
.PHONY : check-camel-case
.PHONY : checks
.PHONY : reqlist
.PHONY : check-variables


all: install-dev install-merlin install-workflow-deps
Expand All @@ -75,6 +59,10 @@ install-dev: virtualenv
$(PIP) install -r requirements/dev.txt


check-variables:
- echo MAX_LINE_LENGTH $(MAX_LINE_LENGTH)


# this only works outside the venv
virtualenv:
$(PYTHON) -m venv $(VENV) --prompt $(PENV) --system-site-packages
Expand Down Expand Up @@ -121,12 +109,12 @@ release:


unit-tests:
-$(PYTHON) -m pytest $(TEST)
-$(PYTHON) -m pytest $(UNIT)


# run CLI tests
cli-tests:
-$(PYTHON) $(TEST)/integration/run_tests.py
-$(PYTHON) $(TEST)/integration/run_tests.py --local


# run unit and CLI tests
Expand All @@ -135,6 +123,7 @@ tests: unit-tests cli-tests

# automatically make python files pep 8-compliant
fix-style:
pip3 install -r requirements/dev.txt -U
isort -rc $(MRLN)
isort -rc $(TEST)
isort *.py
Expand All @@ -145,7 +134,7 @@ fix-style:

# run code style checks
check-style:
-$(PYTHON) -m flake8 --max-complexity $(MAX_COMPLEXITY) --exclude ascii_art.py $(MRLN)
-$(PYTHON) -m flake8 --max-complexity $(MAX_COMPLEXITY) --max-line-length $(MAX_LINE_LENGTH) --exclude ascii_art.py $(MRLN)
-black --check --target-version py36 $(MRLN)


Expand Down
25 changes: 25 additions & 0 deletions config.mk
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
PYTHON?=python3
PYV=$(shell $(PYTHON) -c "import sys;t='{v[0]}_{v[1]}'.format(v=list(sys.version_info[:2]));sys.stdout.write(t)")
PYVD=$(shell $(PYTHON) -c "import sys;t='{v[0]}.{v[1]}'.format(v=list(sys.version_info[:2]));sys.stdout.write(t)")
VENV?=venv_merlin_py$(PYV)
PIP?=$(VENV)/bin/pip
MRLN=merlin
TEST=tests
UNIT=$(TEST)/unit
DOCS=docs
WKFW=merlin/examples/workflows/
# check setup.cfg exists
ifeq (,$(wildcard setup.cfg))
MAX_COMPLEXITY?=15
MAX_LINE_LENGTH=127
else
MAX_COMPLEXITY?=$(shell grep 'max-complexity' setup.cfg | cut -d ' ' -f3)
MAX_LINE_LENGTH=$(shell grep 'max-line-length' setup.cfg | cut -d ' ' -f3)
endif

VER?=1.0.0
VSTRING=[0-9]\+\.[0-9]\+\.[0-9]\+
CHANGELOG_VSTRING="## \[$(VSTRING)\]"
INIT_VSTRING="__version__ = \"$(VSTRING)\""

PENV=merlin$(PYV)
48 changes: 47 additions & 1 deletion docs/source/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -138,9 +138,43 @@ The max number of retries in given step can be specified with the ``max_retries`

Alternatively, use ``exit $(MERLIN_RESTART)`` to run the optional ``<step>.run.restart`` section.

To delay a retry or restart directive, add the ``retry_delay`` field to the step.
Note: ``retry_delay`` only works in server mode (ie not ``--local`` mode).

To restart failed steps after a workflow is done running, see :ref:`restart`.


How do I put a time delay in before a restart or retry?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Add the ``retry_delay`` field to the step. This specifies how many seconds before the task
gets run after the restart. Set this value to large enough for your problem to finish.

See the ``merlin example restart_delay`` example for syntax.

Note: ``retry_delay`` only works in server mode (ie not ``--local`` mode).

I have a long running batch task that needs to restart, what should I do?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before your allocation ends, use ``$(MERLIN_RESTART)`` or ``$(MERLIN_RETRY)`` but
with a ``retry_delay`` on your step for longer that your allocation has left.
The server will hold onto the step for that long (in seconds) before releasing it,
allowing your batch allocation to end without the worker grabbing the step right away.

For instance, your step could look something like this

.. code:: yaml
name: batch_task
description: A long running task that needs to restart
run:
cmd: |
# Run my code, but end 60 seconds before my allocation
my_code --end_early 60s
if [ -e restart_needed_flag ]; then
exit $(MERLIN_RESTART)
fi
retry_delay: 120 # wait at least 2 minutes before restarting
How do I mark a step failure?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Each step is ultimately designated as:
Expand Down Expand Up @@ -179,6 +213,7 @@ Also under ``run``, the following fields are optional:
task_queue: <task queue name for this step>
shell: <e.g., /bin/bash, /usr/bin/env python3>
max_retries: <integer>
retry_delay: <integer: seconds>
nodes: <integer>
procs: <integer>
Expand Down Expand Up @@ -321,6 +356,17 @@ to specify the same launch command that will work on different HPC clusters with
default schedulers such as SLURM or LSF.
More information can be found at the `Flux web page <http://flux-framework.org/docs/home/>`_.


Older versions of flux may need the ``--mpi=none`` argument if flux is
launched on a system using the SLURM scheduler. This argument can be added
in the ``launch_args`` variable in the batch section.

.. code:: yaml
batch:
type: flux
launch_args: --mpi=none
How do I use flux on LC?
~~~~~~~~~~~~~~~~~~~~~~~~
The ``--mpibind=off`` option is currently required when using flux with a slurm launcher
Expand Down Expand Up @@ -361,7 +407,7 @@ The arguments the LAUNCHER syntax will use:

procs: The total number of MPI tasks
nodes: The total number of MPI nodes
walltime: The total walltime of the run (hh:mm:ss) (not available in lsf)
walltime: The total walltime of the run (hh:mm:ss or mm:ss or ss) (not available in lsf)
cores per task: The number of hardware threads per MPI task
gpus per task: The number of GPUs per MPI task

Expand Down
6 changes: 4 additions & 2 deletions docs/source/merlin_specification.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ see :doc:`./merlin_variables`.
If this is unset the number of nodes will be
queried from the environment, failing that, the
number of nodes will be set to 1.
walltime: The total walltime of the batch allocation (hh:mm:ss)
walltime: The total walltime of the batch allocation (hh:mm:ss or mm:ss or ss)
#####################################
Expand Down Expand Up @@ -123,7 +123,7 @@ see :doc:`./merlin_variables`.
# depends: a list of steps this step depends upon (ie parents)
# procs: The total number of MPI tasks
# nodes: The total number of MPI nodes
# walltime: The total walltime of the run (hh:mm:ss) (not available in lsf)
# walltime: The total walltime of the run (hh:mm:ss, mm:ss or ss) (not available in lsf)
# cores per task: The number of hardware threads per MPI task
# gpus per task: The number of GPUs per MPI task
# SLURM specific run flags:
Expand Down Expand Up @@ -168,6 +168,8 @@ see :doc:`./merlin_variables`.
nodes: 1
procs: 1
task_queue: lqueue
max_retries: 3 # maximum number of retries
retry_delay: 10 # delay retry for N seconds (default 1)
batch:
type: <override the default batch type>
Expand Down
11 changes: 9 additions & 2 deletions docs/source/merlin_variables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,10 @@ Step return variables

* - ``$(MERLIN_RESTART)``
- Run this step's ``restart`` command, or re-run ``cmd`` if ``restart``
is absent.
is absent. The default maximum number of retries+restarts for any given step
is 30. You can override this by adding a ``max_retries`` field under the run
field in the specification. Issues a warning. Default will retry in 1 second.
To override the delay time, specify ``retry_delay``.
-
::

Expand All @@ -187,11 +190,14 @@ Step return variables
exit $(MERLIN_RESTART)
restart: |
echo "bye, mom!" >> my_file.txt
max_retries: 23
retry_delay: 10

* - ``$(MERLIN_RETRY)``
- Retry this step's ``cmd`` command. The default maximum number of retries for any given step
is 30. You can override this by adding a ``max_retries`` field under the run
field in the specification. Issues a warning.
field in the specification. Issues a warning. Default will retry in 1 second. To override
the delay time, specify retry_delay.
- ::

run:
Expand All @@ -200,6 +206,7 @@ Step return variables
echo "hi mom!" >> my_file.txt
exit $(MERLIN_RETRY)
max_retries: 23
retry_delay: 10

* - ``$(MERLIN_SOFT_FAIL)``
- Mark this step as a failure, note in the warning log but keep going.
Expand Down
4 changes: 2 additions & 2 deletions merlin/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.7.9.
# This file is part of Merlin, Version: 1.8.0.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down Expand Up @@ -38,7 +38,7 @@
import sys


__version__ = "1.7.9"
__version__ = "1.8.0"
VERSION = __version__
PATH_TO_PROJ = os.path.join(os.path.dirname(__file__), "")

Expand Down
2 changes: 1 addition & 1 deletion merlin/ascii_art.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.7.9.
# This file is part of Merlin, Version: 1.8.0.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
Loading

0 comments on commit eded0d5

Please sign in to comment.