Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce _core internal code structure #122

Merged
merged 37 commits into from
Jan 17, 2022
Merged

Conversation

Spartee
Copy link
Contributor

@Spartee Spartee commented Jan 4, 2022

Description

This PR introduces the _core directory which will now contain all non-user facing classes and functions. These are not meant to be imported by the user. This will make it very clear which modules are to be used within SmartSim.

Because of the large changes in this restructure, it made sense to tackle some of the outstanding technical debt items that were introduced as a result of the growing codebase.

Changes

  • CLI in a single contained module (_cli)
  • Replace bash scripts with configurable installation classes (_install)
  • Some Orchestrator functions split into utilities
  • Constants module deprecated in favor of smartsim.status
  • smartsim.log is directly importable
  • Removal of the TOML configuration option.
  • Error handling improvements.

CLI

The CLI was expanded into multiple files, one for each command. this will make it much easier to add new commands in the future. This does mean that we are deprecating the current ML backend build in favor of this new approach.

The major change here that will be user facing is that users will now do smart build --device gpu instead of smart --device gpu.

All commands

  • smart build - build Redis and RedisAI
  • smart clean - remove RedisAI and ML backends
  • smart clobber - remove every third-party dependency
  • smart site - show the smartsim installation site
  • smart dbcli - return the path to redis-cli

Configurable build classes

The bash scripts normally invoked to build RedisAI and Redis have been replaced with Python classes that are used in both the setup.py (what installs smartsim with pip) and the CLI. SmartSim now has one single class where all third-party dependency versions are listed and configurable through environment variables.

The following environment variables control the pip install time build environment

# The following environment variables represent build time
# options for SmartSim. These are only relevant to when a user
# or CI is invoking the setup.py script.
#
#
# NO_CHECKS
#   If set to 1, the build process will not check for
#   build dependencies like make, gcc, etc
#
# SMARTSIM_REDIS
#   The version of redis to retrive and build with
#
# SMARTSIM_REDIS_URL
#   The URL from which to retrieve redis source code
#
# SMARTREDIS_VERSION
#   The version of SmartRedis to install.
#
# CC
#   The C compiler to use
#
# CXX
#   the CPP compiler to use
#
# MALLOC
#   The memory allocator to use for Redis (options: libc, jemalloc)
#
# BUILD_JOBS
#   Number of jobs to use in the build (defaults to max on node)
#
# SMARTSIM_SUFFIX
#  if set, the version number is set to a developer build
#  with the current version, git-sha, and suffix. This version
#  is then written into SmartSim/smartsim/version.py

This change also enables support for nightly builds of the library as we can specify SMARTSIM_SUFFIX which will trigger the version to be changed to the git sha + current version. (see smartsim/_core/_install/buildenv.py for more.

Deprecating smartsim.constants

It didn't make sense for SmartSim to have internal constants and user facing constants in the same location. Mostly users were invoking the constants.py module for the statuses.

the statuses have now been moved to their own module in the top-level of SmartSim: smartsim.status.

A deprecation warning will be thrown if users try to use the old approach for 1 release.

Removal of the TOML configuration option.

The TOML configuration option seemed to confuse users. We have decided to take out the TOML configuration option in favor of a simpler approach using environment variables. Users can set the same values in their shell profiles.

# Runtime Configuration Values
#
# These values can be set through environment variables to
# override the default binaries and configurations used by
# SmartSim.
#
# RAI_PATH
#   - Path to the RAI shared library
#   - Default: /smartsim/smartsim/_core/lib/redisai.so
#
# REDIS_CONF
#   - Path to the redis.conf file
#   - Default: /SmartSim/smartsim/config/redis6.conf
#
# REDIS_PATH
#   - Path to the redis-server executable
#   - Default: /SmartSim/smartsim/bin/redis-server
#
# REDIS_CLI_PATH
#   - Path to the redis-cli executable
#   - Default: /SmartSim/smartsim/bin/redis-cli
#
# SMARTSIM_LOG_LEVEL
#   - Log level for SmartSim
#   - Default: info
#
# SMARTSIM_JM_INTERVAL
#   - polling interval for communication with scheduler
#   - default: 10 seconds
#


# Testing Configuration Values
#
# SMARTSIM_TEST_INTERFACE
#  - Network interface to use for testing
#  - Default: auto-detected
#
# SMARTSIM_TEST_LAUNCHER
#  - type of launcher to use for testing
#  - Default: Local
#
# SMARTSIM_TEST_DEVICE
#  - CPU or GPU for model serving tests
#  - Default: CPU
#
# SMARTSIM_TEST_ACCOUNT
#  - Account used to run full launcher test suite on external systems
#  - Default: None

We now longer require the usage of the TOML module and hence it has been taken out of the requirements.

see smartsim/_core/config for implementation.

TODO

  • confer with @al-rigazzi about cmake fixes (user issue came up about this today)
  • Fix the actions CI to use the new CLI commands
  • Test build on multiple systems
    - [x] CPU MacOS
    - [x] CPU Linux
    - [ ] GPU Linux
  • Create follow up ticket for investigating RedisAI 1.2.4 bug on clang.
  • Create follow up tickets for documentation items
    - build process
    - RedisAI support
    - cli usage
    - environment variables for build, install, and runtime
  • rebase and fix conflicts with current development
  • investigate issue with pkg_resources Version in python 3.7 (see my GH actions for details)

@Spartee Spartee added area: build Issues related to builds, makefiles, installs, etc type: refactor Issues focused on refactoring existing code labels Jan 4, 2022
Spartee pushed a commit to Spartee/SmartSim that referenced this pull request Jan 8, 2022
SmartSim CrayLabs#122 brought up a user issue with
CUDNN environment variables. To address this
the cli now sets both the caffe and torch env
vars in the RedisAI build

as well, the logging module is now used within
the CLI for better output during the build process.
this replaces our manual Warning_ class
Copy link
Contributor

@EricGustin EricGustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the comments I have made, I have a general question about when the documentation will be updated to reflect this PR. Will this PR contain doc updates, or will that occur in a future PR before the release?

EDIT: I see now that the documentation will be written in a follow up ticket.

smartsim/_core/_cli/cli.py Show resolved Hide resolved
smartsim/_core/_cli/cli.py Show resolved Hide resolved
smartsim/_core/_cli/cli.py Outdated Show resolved Hide resolved
Comment on lines +47 to +62
def site(self):
print(get_install_path())
exit(0)

def dbcli(self):
install_path = get_install_path()
script_path = install_path.joinpath("_core/bin/redis-cli").resolve()
if script_path.is_file():
print(script_path)
exit(0)
else:
print("Redis dependencies not found")
exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know these are really simple commands, and perhaps making a separate file for each could be seen as overkill, but for the sake of consistency, maybe we should think about making Site and Dbcli classes and placing them in their own files so that all commands reside in their own file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think that idea is good in theory, in practice it's a bit overkill. Could split them into utility functions in smartsim/_core/_cli/utils.py though. thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we're not going to place in their own files, then the current implementation is the preferred way.

smartsim/_core/_cli/build.py Outdated Show resolved Hide resolved
@@ -93,33 +93,38 @@ docs:
# help: cov - generate html coverage report for Python client
.PHONY: cov
cov:
@cd ./tests/ && coverage html
@echo if data was present, coverage report is in ./tests/htmlcov/index.html
@coverage html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for changing the coverage reports for the tests to be in the top level rather than in tests/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was a reason and I honestly cannot remember at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the generated files contained within the tests folder seems most intuitive, but if there is a reason for not doing so, then by all means, placing the coverage files at top level is okay.

tests/full_wlm/test_launch_orc_lsf_batch.py Outdated Show resolved Hide resolved
tests/on_wlm/test_launch_errors.py Show resolved Hide resolved
logger.error("RedisAI does not support ONNX on MacOS")
exit(1)
if self.versions.REDISAI > "1.2.3":
logger.error("RedisAI deprecated support for MacOS after 1.2.3 :(")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the case, or did they curb that?

Comment on lines +93 to +95
host_dict = dict()
host_dict["host"] = get_ip_from_host(host)
host_dict["port"] = port
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host_dict = {"host": get_ip_from_host(host), "port": port} might be easier to read?

Sam Partee added 20 commits January 13, 2022 13:00
Given the growth of the library there
was a need to seperate out the user
facing functionality from the backend
of which users are not supposed to import

the structure now includes a _core dir
which houses modules which are not using
facing.

the following have been moved to _core
  - control
  - generation
  - launcher
  - bin and lib
  - utils
The previous interim commit changed the import
structure such that many imports were
now broken in the user facing classes

This is now fixed
The TOML configuration option seemed to
confuse users and we really want most users
to use the default settings anyway. The purpose
of the configuration was to ensure each site
could setup SmartSim in the way they wanted with
a global configuration.

This commit replaces that with simple environment
variables that provide the same functionality.

Site's wishing to have a site-wide configuration
can now set this environment variables upon
loading the SmartSim module
log and slurm modules should have always been
at the top level as they are expected to be
imported by the user.

we may move to a ``wlm`` module in the future
that contains user-facing functions for certain
workload managers.
instead of:

 > from smartsim import constants
 > constats.STATUS_COMPLETED

users should now do

 > from smartsim import status
 > status.STATUS_COMPLETED
 - create_cluster
 - check_cluster_status
the cli now has it's own module
within _core/_cli. Each function is
now split out into it's own file which
will make it much easier to add new additions
in the future.

The build process controlled by setup.py and
the ``smart`` cli used to be governed by two
different build methods mostly contained in
bash scripts.

these approaches have been unified into the
_core/_install directory which replaces the
bash scripts we used to use and works for both
the cli and setup.py.

There are a few non-obvious benefits to this.
 - all versions contained in one place
 - versions can be changed through env vars
 - build process can be changed through env vars
 - multiple RAI versions can now be supported
 - multiple DL framework versions can now be supported
Tests now pass on local MacOS with the
new dir stucture, status, and error changes.
Pyproject.toml is now used for the pytest-cov
configuration as well as the individual coverage
files present in /tests/test_configs/*_cov.cfg
fix the import structure of smartsim.tf and
remove the tensorflow > 2.5 warning as we
will support tf > 2.6 this release
redisAI 1.2.4 seems to have build issues on mac
so we are reverting to 1.2.3 as they use the
same ML backend versions
SmartSim CrayLabs#122 brought up a user issue with
CUDNN environment variables. To address this
the cli now sets both the caffe and torch env
vars in the RedisAI build

as well, the logging module is now used within
the CLI for better output during the build process.
this replaces our manual Warning_ class
Copy link
Contributor

@EricGustin EricGustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Spartee Spartee merged commit ba824d3 into CrayLabs:develop Jan 17, 2022
@Spartee Spartee deleted the core branch January 17, 2022 19:02
@Spartee Spartee mentioned this pull request Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: build Issues related to builds, makefiles, installs, etc type: refactor Issues focused on refactoring existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants