Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.6.1 #490

Merged
merged 27 commits into from
Feb 15, 2024
Merged

Release 0.6.1 #490

merged 27 commits into from
Feb 15, 2024

Conversation

al-rigazzi
Copy link
Collaborator

This PR brings master up to date with develop before releasing v0.6.1.

al-rigazzi and others added 27 commits December 18, 2023 22:02
This PR brings develop up to date with master before releasing v0.6.0

[ committed by @al-rigazzi ]
[ reviewed by @MattToast ]
Torch changed something in their indexing when trying to install from
their provided wheels. This updates the `pip install` command within
`smart build` to ensure that the appropriate packages can be found.

[ committed by @ashao ]
[ reviewed by @ankona ]
This PR adds concurrency groups to GitHub's CI/CD workflows, preventing
multiple workflows from the same PR to be launched concurrently.

[ committed by @al-rigazzi ]
[ reviewed by @MattToast ]
The sphinx-tabs documentation extension uses a white background 
for the tabs component. This causes readability issues with the theme 
that we have chosen. A custom CSS has been added to override
those components to inherit the overall theme color.

[ committed by @mellis13 ]
[ reviewed by @al-rigazzi ]
Adds infrastructure to fetch RedisAI's dependencies. This removes the
need to call RedisAI's `get_deps.sh` script so that we can fetch newer
versions of our machine learning backends than the ones officially
supported by RedisAI.

Additionally, this upgrades the machine learning python packages
required by SmartSim so that they stay up to date with the backends.
This in turn allows us to add Python3.10+ONNX support.

[ committed by @MattToast ]
[ reviewed by @ashao ]
The implementation makes use of python `contextvars.ContextVar` to store
experiment-specific state. The state is used to dynamically modify
experiment-level logging.

For example, this driver:

```py
exp1 = smartsim.Experiment('exp-1')
rs1 = exp1.create_runsettings(...)
model1 = exp1.create_model(..., rs1)

exp2 = smartsim.Experiment('other-exp')
rs2 = exp2.create_runsettings(...)
model2 = exp2.create_model(..., rs2)

exp1.start(model1)
exp1.start(model2)
```

Results in each experiment dynamically registering `logging.FileHandler`
instances that write logs to separate files:

- `/exp-1/.telemetry/smartsim/smartsim.out`
- `/other-exp/.telemetry/smartsim/smartsim.out`

### Key changes:

1. Decorated experiment API w/contextualizer to enrich log context
2. Create/Use `ContextThread` to ensure threads include current context
information
3. Create/Use `ContextAwareLogger` to dynamically add file handlers for
experiment logs
4. Updated manifest serialization to include paths to
experiment-specific log files
5. Added `LowPassFilter` to enable splitting experiment logs across
`xxx.out` and `xxx.err`

### Additional minor changes:

1. Moved `serialize.TELMON_SUBDIR` constant to `Config.telemetry_subdir`
to make it more universally available

---------

Co-authored-by: Matt Drozt <drozt@hpe.com>
Co-authored-by: Matt Drozt <matthew.drozt@gmail.com>

[ committed by @ankona ]
[ reviewed by @al-rigazzi @MattToast  ]
As we are not aware of any system still using the Cobalt workload manager, its support in SmartSim was terminated.

[ committed by @al-rigazzi ]
[ reviewed by @MattToast @ashao ]
This PR updates GitHub CI/CD actions to latest versions, as some of those used in the workflows were outdated.

[ committed by @al-rigazzi ]
[ reviewed by @ashao ]
Quality of life `smart validate` improvements:
- Set `CUDA_VISIBLE_DEVICES` environment variable within `smart
  validate` prior to importing any ML deps to prevent false negatives on
  multi-GPU systems
- Move SmartRedis logs from standard out to dedicated log file in the
  validation temporary directory
- Suppress `sklearn` deprecation warning by pinning `KMeans` constructor
  argument
- Move TF test to last as TF may reserve the GPUs it uses

[ committed by @MattToast ]
[ reviewed by @al-rigazzi @ashao ]
Add Python 3.11 to SmartSim

[ committed by @MattToast ]
[ reviewed by @ashao ]
Relax the required version of `typing_extensions`

[ committed by @MattToast ]
[ reviewed by @ankona ]
This PR merges in github actions for running checks - black and isort

[ reviewed by @MattToast ]
[ committed by @amandarichardsonn ]
This PR adds Python type hinting to RunSettings.colocated_db_settings.

[ reviewed by @MattToast ]
[ committed by @amandarichardsonn ]
This PR fixes the `test_logs.py::test_context_leak` test that was
erroneously creating a directory named `some value` in SmartSim's root
directory.
Add and ship `py.typed` marker to expose inline type hints. Fix
type errors related to SmartRedis.

[ committed by @MattToast ]
[ reviewed by @al-rigazzi ]
This PR merges in functionality to validate the timing format when
requesting a slurm allocation. Previously, no check was required leading
to the WLM responsibility to throw an error. With the new code, SmartSim
will catch and throw.

[ reviewed by @MattToast ]
[ committed by @amandarichardsonn ]
The Torch eval() function is invoked in the tests to resolve
warnings related to model tracing.

[ reviewed by @MattToast @ashao ]
[ committed by @mellis13 ]
With the new ml_lib_builder repository we can now ship a version of
libtorch that is compiled for Mac OSX on Apple Silicon (arm64). Here
the RedisAIBuilder method is reworked to detect whether this
platform and retrieve the appropriate version of libtorch. Some
additional refactoring was done to improve the internals of this class.

[ committed by @ashao ]
[ reviewed by @MattToast ]
Refactor logic of `Manifest.has_db_objects` to remove excess branching
and improve readability/maintainability.

[ committed by @MattToast ]
[ reviewed by @ankona ]
This PR makes several patch changes to prepare for a SmartSim release
including:

- Set the default value of the "enable telemetry" flag to on.
Currently this will enable telemetry system wide until finer grain
control can be established with #460
- Bump the output `manifest.json` version number to match that of
`smartdashboard`
- Pin a watchdog version to avoid build errors


[ committed by @MattToast @ankona ]
[ reviewed by @ankona ]

---------

Co-authored-by: Christopher McBride <christopher.mcbride@gmail.com>
This PR fixes a bug which prevented the expected behavior when the `SMARTSIM_LOG_LEVEL` environment variable was set to `developer`.

[ committed by @al-rigazzi ]
[ reviewed by @MattToast @ankona ]
Updates `Copyright (c) 2021-2023` to `Copyright (c) 2021-2024`
in all of the necessary files.
This PR prevents duplicate ML models and scripts names being added to an
Ensemble member if the names exists already.
The checks are performed for `Ensemble.add_ml_model()`,
`Ensemble.add_model()`, `Ensemble.add_script()` and
`Ensemble.add_function()`.

[ reviewed by @ankona @MattToast ]
[ committed by @amandarichardsonn ]
SmartSim support for MacOS with Apple Silicon is still fragile for common
configurations and also does not have full feature parity with MacOS on
Intel. Specifically, the docs now call out specifically that MacOS on Apple
Silicon with Clang 15 does not build correctly and offers a solution.
Additionally, the docs also highlight that only PyTorch is supported on
MacOS for now.

[ committed by @ashao ]
[ reviewed by @ankona ]
Cloning Redis on Apple Silicon results in files within
some of the Redis build scripts that have Windows-style line endings.
This leads to errors because the interpreter for these scripts cannot be
parsed correctly (e.g `/bin/sh^M`). To solve this, we now modify the
`git clone` for both Redis and RedisAI to set the line endings to 
unix-style line endings when using MacOS on ARM.

[ committed by @ashao and @MattToast ]
[ reviewed by @al-rigazzi ]

Co-authored-by: Matt Drozt <drozt@hpe.com>
This PR updates the changelog to prepare for release.

[ reviewed by @MattToast ]
[ committed by @amandarichardsonn ]
Update version number to 0.6.1

[ committed by @amandarichardsonn @MattToast ]
[ reviewed by @al-rigazzi ]
Copy link
Member

@MattToast MattToast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!! 🎉

Copy link

codecov bot commented Feb 15, 2024

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (9d97397) 90.28% compared to head (a931387) 90.61%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #490      +/-   ##
==========================================
+ Coverage   90.28%   90.61%   +0.32%     
==========================================
  Files          60       60              
  Lines        3748     3826      +78     
==========================================
+ Hits         3384     3467      +83     
+ Misses        364      359       -5     
Files Coverage Δ
smartsim/__init__.py 100.00% <ø> (ø)
smartsim/_core/__init__.py 100.00% <ø> (ø)
smartsim/_core/config/__init__.py 100.00% <ø> (ø)
smartsim/_core/config/config.py 98.79% <100.00%> (+0.04%) ⬆️
smartsim/_core/control/__init__.py 100.00% <ø> (ø)
smartsim/_core/control/controller.py 87.20% <100.00%> (+0.07%) ⬆️
smartsim/_core/control/job.py 94.94% <ø> (ø)
smartsim/_core/control/jobmanager.py 94.19% <100.00%> (ø)
smartsim/_core/control/manifest.py 96.52% <100.00%> (+0.30%) ⬆️
smartsim/_core/generation/__init__.py 100.00% <ø> (ø)
... and 50 more

@al-rigazzi al-rigazzi merged commit 8b742ec into master Feb 15, 2024
51 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants