Release 0.6.1 #490

al-rigazzi · 2024-02-15T22:46:59Z

This PR brings master up to date with develop before releasing v0.6.1.

@al-rigazzi

This PR brings develop up to date with master before releasing v0.6.0 [ committed by @al-rigazzi ] [ reviewed by @MattToast ]

@ashao

Torch changed something in their indexing when trying to install from their provided wheels. This updates the `pip install` command within `smart build` to ensure that the appropriate packages can be found. [ committed by @ashao ] [ reviewed by @ankona ]

@al-rigazzi

This PR adds concurrency groups to GitHub's CI/CD workflows, preventing multiple workflows from the same PR to be launched concurrently. [ committed by @al-rigazzi ] [ reviewed by @MattToast ]

@mellis13

The sphinx-tabs documentation extension uses a white background for the tabs component. This causes readability issues with the theme that we have chosen. A custom CSS has been added to override those components to inherit the overall theme color. [ committed by @mellis13 ] [ reviewed by @al-rigazzi ]

@MattToast

Adds infrastructure to fetch RedisAI's dependencies. This removes the need to call RedisAI's `get_deps.sh` script so that we can fetch newer versions of our machine learning backends than the ones officially supported by RedisAI. Additionally, this upgrades the machine learning python packages required by SmartSim so that they stay up to date with the backends. This in turn allows us to add Python3.10+ONNX support. [ committed by @MattToast ] [ reviewed by @ashao ]

@ankona

The implementation makes use of python `contextvars.ContextVar` to store experiment-specific state. The state is used to dynamically modify experiment-level logging. For example, this driver: ```py exp1 = smartsim.Experiment('exp-1') rs1 = exp1.create_runsettings(...) model1 = exp1.create_model(..., rs1) exp2 = smartsim.Experiment('other-exp') rs2 = exp2.create_runsettings(...) model2 = exp2.create_model(..., rs2) exp1.start(model1) exp1.start(model2) ``` Results in each experiment dynamically registering `logging.FileHandler` instances that write logs to separate files: - `/exp-1/.telemetry/smartsim/smartsim.out` - `/other-exp/.telemetry/smartsim/smartsim.out` ### Key changes: 1. Decorated experiment API w/contextualizer to enrich log context 2. Create/Use `ContextThread` to ensure threads include current context information 3. Create/Use `ContextAwareLogger` to dynamically add file handlers for experiment logs 4. Updated manifest serialization to include paths to experiment-specific log files 5. Added `LowPassFilter` to enable splitting experiment logs across `xxx.out` and `xxx.err` ### Additional minor changes: 1. Moved `serialize.TELMON_SUBDIR` constant to `Config.telemetry_subdir` to make it more universally available --------- Co-authored-by: Matt Drozt <drozt@hpe.com> Co-authored-by: Matt Drozt <matthew.drozt@gmail.com> [ committed by @ankona ] [ reviewed by @al-rigazzi @MattToast ]

@al-rigazzi

As we are not aware of any system still using the Cobalt workload manager, its support in SmartSim was terminated. [ committed by @al-rigazzi ] [ reviewed by @MattToast @ashao ]

@al-rigazzi

This PR updates GitHub CI/CD actions to latest versions, as some of those used in the workflows were outdated. [ committed by @al-rigazzi ] [ reviewed by @ashao ]

@MattToast

Quality of life `smart validate` improvements: - Set `CUDA_VISIBLE_DEVICES` environment variable within `smart validate` prior to importing any ML deps to prevent false negatives on multi-GPU systems - Move SmartRedis logs from standard out to dedicated log file in the validation temporary directory - Suppress `sklearn` deprecation warning by pinning `KMeans` constructor argument - Move TF test to last as TF may reserve the GPUs it uses [ committed by @MattToast ] [ reviewed by @al-rigazzi @ashao ]

@MattToast

Add Python 3.11 to SmartSim [ committed by @MattToast ] [ reviewed by @ashao ]

@MattToast

Relax the required version of `typing_extensions` [ committed by @MattToast ] [ reviewed by @ankona ]

@MattToast

This PR merges in github actions for running checks - black and isort [ reviewed by @MattToast ] [ committed by @amandarichardsonn ]

@MattToast

This PR adds Python type hinting to RunSettings.colocated_db_settings. [ reviewed by @MattToast ] [ committed by @amandarichardsonn ]

This PR fixes the `test_logs.py::test_context_leak` test that was erroneously creating a directory named `some value` in SmartSim's root directory.

@MattToast

Add and ship `py.typed` marker to expose inline type hints. Fix type errors related to SmartRedis. [ committed by @MattToast ] [ reviewed by @al-rigazzi ]

@MattToast

This PR merges in functionality to validate the timing format when requesting a slurm allocation. Previously, no check was required leading to the WLM responsibility to throw an error. With the new code, SmartSim will catch and throw. [ reviewed by @MattToast ] [ committed by @amandarichardsonn ]

@MattToast

The Torch eval() function is invoked in the tests to resolve warnings related to model tracing. [ reviewed by @MattToast @ashao ] [ committed by @mellis13 ]

@ashao

With the new ml_lib_builder repository we can now ship a version of libtorch that is compiled for Mac OSX on Apple Silicon (arm64). Here the RedisAIBuilder method is reworked to detect whether this platform and retrieve the appropriate version of libtorch. Some additional refactoring was done to improve the internals of this class. [ committed by @ashao ] [ reviewed by @MattToast ]

@MattToast

Refactor logic of `Manifest.has_db_objects` to remove excess branching and improve readability/maintainability. [ committed by @MattToast ] [ reviewed by @ankona ]

@MattToast

This PR makes several patch changes to prepare for a SmartSim release including: - Set the default value of the "enable telemetry" flag to on. Currently this will enable telemetry system wide until finer grain control can be established with #460 - Bump the output `manifest.json` version number to match that of `smartdashboard` - Pin a watchdog version to avoid build errors [ committed by @MattToast @ankona ] [ reviewed by @ankona ] --------- Co-authored-by: Christopher McBride <christopher.mcbride@gmail.com>

@al-rigazzi

This PR fixes a bug which prevented the expected behavior when the `SMARTSIM_LOG_LEVEL` environment variable was set to `developer`. [ committed by @al-rigazzi ] [ reviewed by @MattToast @ankona ]

@ankona

This PR prevents duplicate ML models and scripts names being added to an Ensemble member if the names exists already. The checks are performed for `Ensemble.add_ml_model()`, `Ensemble.add_model()`, `Ensemble.add_script()` and `Ensemble.add_function()`. [ reviewed by @ankona @MattToast ] [ committed by @amandarichardsonn ]

@ashao

SmartSim support for MacOS with Apple Silicon is still fragile for common configurations and also does not have full feature parity with MacOS on Intel. Specifically, the docs now call out specifically that MacOS on Apple Silicon with Clang 15 does not build correctly and offers a solution. Additionally, the docs also highlight that only PyTorch is supported on MacOS for now. [ committed by @ashao ] [ reviewed by @ankona ]

@ashao

Cloning Redis on Apple Silicon results in files within some of the Redis build scripts that have Windows-style line endings. This leads to errors because the interpreter for these scripts cannot be parsed correctly (e.g `/bin/sh^M`). To solve this, we now modify the `git clone` for both Redis and RedisAI to set the line endings to unix-style line endings when using MacOS on ARM. [ committed by @ashao and @MattToast ] [ reviewed by @al-rigazzi ] Co-authored-by: Matt Drozt <drozt@hpe.com>

@MattToast

This PR updates the changelog to prepare for release. [ reviewed by @MattToast ] [ committed by @amandarichardsonn ]

@amandarichardsonn

Update version number to 0.6.1 [ committed by @amandarichardsonn @MattToast ] [ reviewed by @al-rigazzi ]

MattToast

LGTM!! 🎉

codecov · 2024-02-15T22:55:31Z

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (9d97397) 90.28% compared to head (a931387) 90.61%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #490      +/-   ##
==========================================
+ Coverage   90.28%   90.61%   +0.32%     
==========================================
  Files          60       60              
  Lines        3748     3826      +78     
==========================================
+ Hits         3384     3467      +83     
+ Misses        364      359       -5

Files	Coverage Δ
smartsim/__init__.py	`100.00% <ø> (ø)`
smartsim/_core/__init__.py	`100.00% <ø> (ø)`
smartsim/_core/config/__init__.py	`100.00% <ø> (ø)`
smartsim/_core/config/config.py	`98.79% <100.00%> (+0.04%)`	⬆️
smartsim/_core/control/__init__.py	`100.00% <ø> (ø)`
smartsim/_core/control/controller.py	`87.20% <100.00%> (+0.07%)`	⬆️
smartsim/_core/control/job.py	`94.94% <ø> (ø)`
smartsim/_core/control/jobmanager.py	`94.19% <100.00%> (ø)`
smartsim/_core/control/manifest.py	`96.52% <100.00%> (+0.30%)`	⬆️
smartsim/_core/generation/__init__.py	`100.00% <ø> (ø)`
... and 50 more

al-rigazzi and others added 27 commits December 18, 2023 22:02

Merge pull request #444 from CrayLabs/master

142ddc1

This PR brings develop up to date with master before releasing v0.6.0 [ committed by @al-rigazzi ] [ reviewed by @MattToast ]

Add concurrency group to test workflow (#439)

4f3a9a1

This PR adds concurrency groups to GitHub's CI/CD workflows, preventing multiple workflows from the same PR to be launched concurrently. [ committed by @al-rigazzi ] [ reviewed by @MattToast ]

Remove Cobalt support (#448)

e107932

As we are not aware of any system still using the Cobalt workload manager, its support in SmartSim was terminated. [ committed by @al-rigazzi ] [ reviewed by @MattToast @ashao ]

Update actions (#446)

cab2ef8

This PR updates GitHub CI/CD actions to latest versions, as some of those used in the workflows were outdated. [ committed by @al-rigazzi ] [ reviewed by @ashao ]

Python 3.11 Support (#461)

92a3c99

Add Python 3.11 to SmartSim [ committed by @MattToast ] [ reviewed by @ashao ]

Relax typing extensions required version (#459)

e4d1646

Relax the required version of `typing_extensions` [ committed by @MattToast ] [ reviewed by @ankona ]

Add isort/black check to github actions (#464)

50aa382

This PR merges in github actions for running checks - black and isort [ reviewed by @MattToast ] [ committed by @amandarichardsonn ]

Fixed Typehint for RunSettings.colocated_db_settings (#462)

092163b

This PR adds Python type hinting to RunSettings.colocated_db_settings. [ reviewed by @MattToast ] [ committed by @amandarichardsonn ]

Fix test_logs to prevent generation of dir (#467)

7803f4d

This PR fixes the `test_logs.py::test_context_leak` test that was erroneously creating a directory named `some value` in SmartSim's root directory.

Expose Typehints (#468)

b160c05

Add and ship `py.typed` marker to expose inline type hints. Fix type errors related to SmartRedis. [ committed by @MattToast ] [ reviewed by @al-rigazzi ]

Add eval() to remove Torch warnings during testing (#472)

106d70f

The Torch eval() function is invoked in the tests to resolve warnings related to model tracing. [ reviewed by @MattToast @ashao ] [ committed by @mellis13 ]

Manifest: has DB objects refactor (#476)

b84b49f

Refactor logic of `Manifest.has_db_objects` to remove excess branching and improve readability/maintainability. [ committed by @MattToast ] [ reviewed by @ankona ]

Use developer log level, protect logger defaults in test (#473)

8408368

This PR fixes a bug which prevented the expected behavior when the `SMARTSIM_LOG_LEVEL` environment variable was set to `developer`. [ committed by @al-rigazzi ] [ reviewed by @MattToast @ankona ]

Update license to include 2024 (#485)

96d6ef0

Updates `Copyright (c) 2021-2023` to `Copyright (c) 2021-2024` in all of the necessary files.

Update to changelog for release (#487)

784fd4e

This PR updates the changelog to prepare for release. [ reviewed by @MattToast ] [ committed by @amandarichardsonn ]

Updating SmartSim version for release (#486)

a931387

Update version number to 0.6.1 [ committed by @amandarichardsonn @MattToast ] [ reviewed by @al-rigazzi ]

al-rigazzi requested review from ankona and MattToast February 15, 2024 22:46

MattToast approved these changes Feb 15, 2024

View reviewed changes

al-rigazzi merged commit 8b742ec into master Feb 15, 2024
51 of 69 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.6.1 #490

Release 0.6.1 #490

al-rigazzi commented Feb 15, 2024

MattToast left a comment

codecov bot commented Feb 15, 2024 •

edited

Loading

Release 0.6.1 #490

Release 0.6.1 #490

Conversation

al-rigazzi commented Feb 15, 2024

MattToast left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 15, 2024 • edited Loading

Codecov Report

codecov bot commented Feb 15, 2024 •

edited

Loading