Skip to content

v0.4.2

Compare
Choose a tag to compare
@ashao ashao released this 12 Apr 20:48
· 194 commits to master since this release
7e04b09

Released on April 12, 2023

Description

This release of SmartSim had a focus on polishing and extending exiting
features already provided by SmartSim. Most notably, this release
provides support to allow users to colocate their models with an
orchestrator using Unix domain sockets and support for launching models
as batch jobs.

Additionally, SmartSim has updated its tool chains to provide a better
user experience. Notably, SmarSim can now be used with Python 3.10,
Redis 7.0.5, and RedisAI 1.2.7. Furthermore, SmartSim now utilizes
SmartRedis's aggregation lists to streamline the use and extension of
ML data loaders, making working with popular machine learning frameworks
in SmartSim a breeze.

A full list of changes and detailed notes can be found below:

  • Add support for colocating an orchestrator over UDS
  • Add support for Python 3.10, deprecate support for Python 3.7 and
    RedisAI 1.2.3
  • Drop support for Ray
  • Update ML data loaders to make use of SmartRedis's aggregation
    lists
  • Allow for models to be launched independently as batch jobs
  • Update to current version of Redis to 7.0.5
  • Add support for RedisAI 1.2.7, pyTorch 1.11.0, Tensorflow 2.8.0,
    ONNXRuntime 1.11.1
  • Fix bug in colocated database entrypoint when loading PyTorch models
  • Fix test suite behavior with environment variables

Detailed Notes

  • Running some tests could result in some SmartSim-specific
    environment variables to be set. Such environment variables are now
    reset after each test execution. Also, a warning for environment
    variable usage in Slurm was added, to make the user aware in case an
    environment variable will not be assigned the desired value with
    [--export]{.title-ref}.
    (PR270)
  • The PyTorch and TensorFlow data loaders were update to make use of
    aggregation lists. This breaks their API, but makes them easier to
    use. (PR264)
  • The support for Ray was dropped, as its most recent versions caused
    problems when deployed through SmartSim. We plan to release a
    separate add-on library to accomplish the same results. If you are
    interested in getting the Ray launch functionality back in your
    workflow, please get in touch with us!
    (PR263)
  • Update from Redis version 6.0.8 to 7.0.5.
    (PR258)
  • Adds support for Python 3.10 without the ONNX machine learning
    backend. Deprecates support for Python 3.7 as it will stop receiving
    security updates. Deprecates support for RedisAI 1.2.3. Update the
    build process to be able to correctly fetch supported dependencies.
    If a user attempts to build an unsupported dependency, an error
    message is shown highlighting the discrepancy.
    (PR256)
  • Models were given a [batch_settings]{.title-ref} attribute. When
    launching a model through [Experiment.start]{.title-ref} the
    [Experiment]{.title-ref} will first check for a non-nullish value at
    that attribute. If the check is satisfied, the
    [Experiment]{.title-ref} will attempt to wrap the underlying run
    command in a batch job using the object referenced at
    [Model.batch_settings]{.title-ref} as the batch settings for the
    job. If the check is not satisfied, the [Model]{.title-ref} is
    launched in the traditional manner as a job step.
    (PR245)
  • Fix bug in colocated database entrypoint stemming from uninitialized
    variables. This bug affects PyTorch models being loaded into the
    database. (PR237)
  • The release of RedisAI 1.2.7 allows us to update support for recent
    versions of PyTorch, Tensorflow, and ONNX
    (PR234)
  • Make installation of correct Torch backend more reliable according
    to instruction from PyTorch
  • In addition to TCP, add UDS support for colocating an orchestrator
    with models. Methods [Model.colocate_db_tcp]{.title-ref} and
    [Model.colocate_db_uds]{.title-ref} were added to expose this
    functionality. The [Model.colocate_db]{.title-ref} method remains
    and uses TCP for backward compatibility
    (PR246)