Introduce _core internal code structure #122

Spartee · 2022-01-04T02:36:32Z

Description

This PR introduces the _core directory which will now contain all non-user facing classes and functions. These are not meant to be imported by the user. This will make it very clear which modules are to be used within SmartSim.

Because of the large changes in this restructure, it made sense to tackle some of the outstanding technical debt items that were introduced as a result of the growing codebase.

Changes

CLI in a single contained module (_cli)
Replace bash scripts with configurable installation classes (_install)
Some Orchestrator functions split into utilities
Constants module deprecated in favor of smartsim.status
smartsim.log is directly importable
Removal of the TOML configuration option.
Error handling improvements.

CLI

The CLI was expanded into multiple files, one for each command. this will make it much easier to add new commands in the future. This does mean that we are deprecating the current ML backend build in favor of this new approach.

The major change here that will be user facing is that users will now do smart build --device gpu instead of smart --device gpu.

All commands

smart build - build Redis and RedisAI
smart clean - remove RedisAI and ML backends
smart clobber - remove every third-party dependency
smart site - show the smartsim installation site
smart dbcli - return the path to redis-cli

Configurable build classes

The bash scripts normally invoked to build RedisAI and Redis have been replaced with Python classes that are used in both the setup.py (what installs smartsim with pip) and the CLI. SmartSim now has one single class where all third-party dependency versions are listed and configurable through environment variables.

The following environment variables control the pip install time build environment

# The following environment variables represent build time
# options for SmartSim. These are only relevant to when a user
# or CI is invoking the setup.py script.
#
#
# NO_CHECKS
#   If set to 1, the build process will not check for
#   build dependencies like make, gcc, etc
#
# SMARTSIM_REDIS
#   The version of redis to retrive and build with
#
# SMARTSIM_REDIS_URL
#   The URL from which to retrieve redis source code
#
# SMARTREDIS_VERSION
#   The version of SmartRedis to install.
#
# CC
#   The C compiler to use
#
# CXX
#   the CPP compiler to use
#
# MALLOC
#   The memory allocator to use for Redis (options: libc, jemalloc)
#
# BUILD_JOBS
#   Number of jobs to use in the build (defaults to max on node)
#
# SMARTSIM_SUFFIX
#  if set, the version number is set to a developer build
#  with the current version, git-sha, and suffix. This version
#  is then written into SmartSim/smartsim/version.py

This change also enables support for nightly builds of the library as we can specify SMARTSIM_SUFFIX which will trigger the version to be changed to the git sha + current version. (see smartsim/_core/_install/buildenv.py for more.

Deprecating `smartsim.constants`

It didn't make sense for SmartSim to have internal constants and user facing constants in the same location. Mostly users were invoking the constants.py module for the statuses.

the statuses have now been moved to their own module in the top-level of SmartSim: smartsim.status.

A deprecation warning will be thrown if users try to use the old approach for 1 release.

Removal of the TOML configuration option.

The TOML configuration option seemed to confuse users. We have decided to take out the TOML configuration option in favor of a simpler approach using environment variables. Users can set the same values in their shell profiles.

# Runtime Configuration Values
#
# These values can be set through environment variables to
# override the default binaries and configurations used by
# SmartSim.
#
# RAI_PATH
#   - Path to the RAI shared library
#   - Default: /smartsim/smartsim/_core/lib/redisai.so
#
# REDIS_CONF
#   - Path to the redis.conf file
#   - Default: /SmartSim/smartsim/config/redis6.conf
#
# REDIS_PATH
#   - Path to the redis-server executable
#   - Default: /SmartSim/smartsim/bin/redis-server
#
# REDIS_CLI_PATH
#   - Path to the redis-cli executable
#   - Default: /SmartSim/smartsim/bin/redis-cli
#
# SMARTSIM_LOG_LEVEL
#   - Log level for SmartSim
#   - Default: info
#
# SMARTSIM_JM_INTERVAL
#   - polling interval for communication with scheduler
#   - default: 10 seconds
#


# Testing Configuration Values
#
# SMARTSIM_TEST_INTERFACE
#  - Network interface to use for testing
#  - Default: auto-detected
#
# SMARTSIM_TEST_LAUNCHER
#  - type of launcher to use for testing
#  - Default: Local
#
# SMARTSIM_TEST_DEVICE
#  - CPU or GPU for model serving tests
#  - Default: CPU
#
# SMARTSIM_TEST_ACCOUNT
#  - Account used to run full launcher test suite on external systems
#  - Default: None

We now longer require the usage of the TOML module and hence it has been taken out of the requirements.

see smartsim/_core/config for implementation.

TODO

confer with @al-rigazzi about cmake fixes (user issue came up about this today)
Fix the actions CI to use the new CLI commands
Test build on multiple systems
- [x] CPU MacOS
- [x] CPU Linux
- [ ] GPU Linux
Create follow up ticket for investigating RedisAI 1.2.4 bug on clang.
Create follow up tickets for documentation items
- build process
- RedisAI support
- cli usage
- environment variables for build, install, and runtime
rebase and fix conflicts with current development
investigate issue with pkg_resources Version in python 3.7 (see my GH actions for details)

SmartSim CrayLabs#122 brought up a user issue with CUDNN environment variables. To address this the cli now sets both the caffe and torch env vars in the RedisAI build as well, the logging module is now used within the CLI for better output during the build process. this replaces our manual Warning_ class

EricGustin

In addition to the comments I have made, I have a general question about when the documentation will be updated to reflect this PR. Will this PR contain doc updates, or will that occur in a future PR before the release?

EDIT: I see now that the documentation will be written in a follow up ticket.

smartsim/_core/_cli/cli.py

EricGustin · 2022-01-04T23:45:06Z

smartsim/_core/_cli/cli.py

+    def site(self):
+        print(get_install_path())
+        exit(0)
+
+    def dbcli(self):
+        install_path = get_install_path()
+        script_path = install_path.joinpath("_core/bin/redis-cli").resolve()
+        if script_path.is_file():
+            print(script_path)
+            exit(0)
+        else:
+            print("Redis dependencies not found")
+            exit(1)


I know these are really simple commands, and perhaps making a separate file for each could be seen as overkill, but for the sake of consistency, maybe we should think about making Site and Dbcli classes and placing them in their own files so that all commands reside in their own file.

While I think that idea is good in theory, in practice it's a bit overkill. Could split them into utility functions in smartsim/_core/_cli/utils.py though. thoughts?

I think if we're not going to place in their own files, then the current implementation is the preferred way.

smartsim/_core/_cli/build.py

EricGustin · 2022-01-12T18:50:49Z

Makefile

@@ -93,33 +93,38 @@ docs:
 # help: cov                            - generate html coverage report for Python client
 .PHONY: cov
 cov:
-		@cd ./tests/ && coverage html
-		@echo if data was present, coverage report is in ./tests/htmlcov/index.html
+	@coverage html


Is there a reason for changing the coverage reports for the tests to be in the top level rather than in tests/?

there was a reason and I honestly cannot remember at the moment.

Keeping the generated files contained within the tests folder seems most intuitive, but if there is a reason for not doing so, then by all means, placing the coverage files at top level is okay.

tests/full_wlm/test_launch_orc_lsf_batch.py

tests/on_wlm/test_launch_errors.py

EricGustin · 2022-01-12T21:06:22Z

smartsim/_core/_cli/build.py

+                logger.error("RedisAI does not support ONNX on MacOS")
+                exit(1)
+            if self.versions.REDISAI > "1.2.3":
+                logger.error("RedisAI deprecated support for MacOS after 1.2.3 :(")


Is this still the case, or did they curb that?

EricGustin · 2022-01-12T21:13:15Z

smartsim/_core/utils/redisUtils.py

+            host_dict = dict()
+            host_dict["host"] = get_ip_from_host(host)
+            host_dict["port"] = port


host_dict = {"host": get_ip_from_host(host), "port": port} might be easier to read?

Given the growth of the library there was a need to seperate out the user facing functionality from the backend of which users are not supposed to import the structure now includes a _core dir which houses modules which are not using facing. the following have been moved to _core - control - generation - launcher - bin and lib - utils

The previous interim commit changed the import structure such that many imports were now broken in the user facing classes This is now fixed

The TOML configuration option seemed to confuse users and we really want most users to use the default settings anyway. The purpose of the configuration was to ensure each site could setup SmartSim in the way they wanted with a global configuration. This commit replaces that with simple environment variables that provide the same functionality. Site's wishing to have a site-wide configuration can now set this environment variables upon loading the SmartSim module

log and slurm modules should have always been at the top level as they are expected to be imported by the user. we may move to a ``wlm`` module in the future that contains user-facing functions for certain workload managers.

instead of: > from smartsim import constants > constats.STATUS_COMPLETED users should now do > from smartsim import status > status.STATUS_COMPLETED

- create_cluster - check_cluster_status

the cli now has it's own module within _core/_cli. Each function is now split out into it's own file which will make it much easier to add new additions in the future. The build process controlled by setup.py and the ``smart`` cli used to be governed by two different build methods mostly contained in bash scripts. these approaches have been unified into the _core/_install directory which replaces the bash scripts we used to use and works for both the cli and setup.py. There are a few non-obvious benefits to this. - all versions contained in one place - versions can be changed through env vars - build process can be changed through env vars - multiple RAI versions can now be supported - multiple DL framework versions can now be supported

Tests now pass on local MacOS with the new dir stucture, status, and error changes.

Pyproject.toml is now used for the pytest-cov configuration as well as the individual coverage files present in /tests/test_configs/*_cov.cfg

fix the import structure of smartsim.tf and remove the tensorflow > 2.5 warning as we will support tf > 2.6 this release

redisAI 1.2.4 seems to have build issues on mac so we are reverting to 1.2.3 as they use the same ML backend versions

SmartSim CrayLabs#122 brought up a user issue with CUDNN environment variables. To address this the cli now sets both the caffe and torch env vars in the RedisAI build as well, the logging module is now used within the CLI for better output during the build process. this replaces our manual Warning_ class

Minor bug which concern the orch status checks in a few slurm tests have been addressed.

This commit adds the Versioning support we need to support multiple RedisAI versions in SmartSim.

EricGustin

lgtm

Spartee added area: build Issues related to builds, makefiles, installs, etc type: refactor Issues focused on refactoring existing code labels Jan 4, 2022

Spartee requested a review from al-rigazzi January 4, 2022 02:36

Spartee force-pushed the core branch from 5fd3b14 to 4b1daf2 Compare January 4, 2022 06:00

Spartee requested a review from EricGustin January 4, 2022 17:07

This was referenced Jan 4, 2022

Implement Version class for third-party dependency version checking. #123

Closed

Check prerequisite versions for toolchains. #124

Open

Special torch version breaks GPU install of ML backends #121

Closed

EricGustin reviewed Jan 12, 2022

View reviewed changes

Sam Partee added 20 commits January 13, 2022 13:00

Change front-end import structure

faad1ed

The previous interim commit changed the import structure such that many imports were now broken in the user facing classes This is now fixed

Mirgrate the log and slurm modules

c60ec17

log and slurm modules should have always been at the top level as they are expected to be imported by the user. we may move to a ``wlm`` module in the future that contains user-facing functions for certain workload managers.

Rename constants to status

927ee73

instead of: > from smartsim import constants > constats.STATUS_COMPLETED users should now do > from smartsim import status > status.STATUS_COMPLETED

Move Orchestrator functions to _core

c6eca19

- create_cluster - check_cluster_status

make style

c18adf9

clean up circular imports in config

24bbee1

Remove ill-placed SSConfigErrors

fd0311b

Adapt tests for dir change

b24afb5

Tests now pass on local MacOS with the new dir stucture, status, and error changes.

Fix coverage stats

6b45416

Pyproject.toml is now used for the pytest-cov configuration as well as the individual coverage files present in /tests/test_configs/*_cov.cfg

fix smartsim.tf api

d1479ac

fix the import structure of smartsim.tf and remove the tensorflow > 2.5 warning as we will support tf > 2.6 this release

make style

bb8ed36

Fix .gitignore and edit CLI commands in CI

fbd718b

CI fixes

5418504

Revert to RAI 1.2.3

9d8916c

redisAI 1.2.4 seems to have build issues on mac so we are reverting to 1.2.3 as they use the same ML backend versions

Correct comment in config

874001c

Minor cleanup in builder class

a85c0b4

Sam Partee added 12 commits January 13, 2022 13:14

Fix minor issues in logging and setup.py

75e03ce

Addresses testing issues on Slurm

536dd86

Minor bug which concern the orch status checks in a few slurm tests have been addressed.

fix tests

e06d954

buildenv fix

fddff85

refactor build dep checking

68c1acd

small fix to build

829eaef

Add url checks to versioner.py

0e519b5

Add Version_ class and RAI version support

5fa8d14

This commit adds the Versioning support we need to support multiple RedisAI versions in SmartSim.

Update tf to use Version_

1b40ddb

Clean up install classes

b221e91

bump tf and onnx versions

690bd48

Rebase and address review

9fece0f

Spartee force-pushed the core branch from 3565c62 to 9fece0f Compare January 13, 2022 21:57

Sam Partee added 3 commits January 13, 2022 15:16

Branch support for RedisAI and Redis

45ded43

Add ability to support future RAI versions

2518655

make style

96c2250

EricGustin approved these changes Jan 14, 2022

View reviewed changes

Sam Partee added 2 commits January 14, 2022 14:14

Edit Tensorflow cmake

0b6b87d

small fix in Python build

b9d74d0

Spartee merged commit ba824d3 into CrayLabs:develop Jan 17, 2022

Spartee deleted the core branch January 17, 2022 19:02

Spartee mentioned this pull request Jan 17, 2022

Support KeyDB #126

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce _core internal code structure #122

Introduce _core internal code structure #122

Spartee commented Jan 4, 2022 •

edited

Loading

EricGustin left a comment •

edited

Loading

EricGustin Jan 4, 2022

Spartee Jan 13, 2022

EricGustin Jan 14, 2022

EricGustin Jan 12, 2022

Spartee Jan 13, 2022

EricGustin Jan 14, 2022

EricGustin Jan 12, 2022

EricGustin Jan 12, 2022

EricGustin left a comment

Introduce _core internal code structure #122

Introduce _core internal code structure #122

Conversation

Spartee commented Jan 4, 2022 • edited Loading

Description

Changes

CLI

Configurable build classes

Deprecating smartsim.constants

Removal of the TOML configuration option.

TODO

EricGustin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EricGustin left a comment

Choose a reason for hiding this comment

Spartee commented Jan 4, 2022 •

edited

Loading

Deprecating `smartsim.constants`

EricGustin left a comment •

edited

Loading