Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML DataLoaders #115

Merged
merged 125 commits into from
Feb 3, 2022
Merged

ML DataLoaders #115

merged 125 commits into from
Feb 3, 2022

Conversation

al-rigazzi
Copy link
Collaborator

@al-rigazzi al-rigazzi commented Dec 2, 2021

This PR adds ML Data Loaders for Keras/TF and PyTorch.

They follow the download-and-store approach, not the streaming one (thus this could be added). Samples are downloaded from the DB and kept as copy of the data set (which means that they are not discarded after one epoch).

Specializations required some fine tuning, but the behavior is the same for TF and PyTorch. There are super-classes for all downloaders.

@al-rigazzi al-rigazzi marked this pull request as ready for review December 6, 2021 16:28
Copy link
Contributor

@Spartee Spartee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you run make style on this before I place in review comments?

@Spartee
Copy link
Contributor

Spartee commented Dec 15, 2021

Went to review and looks like there are still a couple conflicts @al-rigazzi

@Spartee Spartee added area: ML Issues related to SmartSim ML classes and utilities type: feature Issues that include feature request or feature idea labels Jan 13, 2022
@Spartee Spartee self-requested a review January 17, 2022 20:45
Copy link
Contributor

@Spartee Spartee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple things

  1. make sure to throw the deprecation wanring for smartsim.tf and lets let the user import that for a release. basically an empty module with the dep and an import works here. use the warnings library and throw DeprecationWarning
  2. Some re-naming for brevity
  3. Lets add tests for all of these in the tests/backends folder.
  4. Use the logger.
  5. Lotta copy paste, lets keep the comments to what users need to see and direct them to the places they need to go for more information if necessary. These should be easy to grok in a VSCode like editor
  6. instead of smartsim.ml.tf.data lets have those imports resolve to smartsim.ml.tf for brevity.
  7. Workers comment may be in a differnt ticket but lets expose that if the functionality works and be sure to document the decision there.
  8. tests tests tests tests

smartsim/ml/data.py Outdated Show resolved Hide resolved
batch_key = form_name(self.sample_prefix, self.batch_idx, sub_index)
self.client.put_tensor(batch_key, samples)
if self.verbose:
print(f"Put batch {batch_key}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger instead here?


from SmartRedis import Client
# simulation initialization code
client = Client(cluster=False, address=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note here that address is needed if not launched through SmartSim


# producer
producer_script = "producer.py"
settings = RunSettings("python", exe_args=producer_script)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_run_settings

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

# producer
producer_script = "producer.py"
settings = RunSettings("python", exe_args=producer_script)
uploader_model = exp.create_model("producer", settings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can enable key prefixing here instead and save a line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had no clue we could do that.

self.init_sources()
self.init_samples()

def log(self, message):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger??

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, added logger now!

`auto`.

- When specifying `auto`, the user must also specify
`uploader_name`. BatchDownloader will get all needed information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you might have done some copy paste here.

Copy link
Collaborator Author

@al-rigazzi al-rigazzi Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, now I fixed all names and only kept docstrings of base classes

:type smartredis_cluster: bool
:param smartredis_address: Address of Redis client as <ip_address>:<port>
:type smartredis_address: str
:param replica_rank: When BatchDownloader is used in a distributed setting, indicates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have these an sub-indicies? this doesn't make sense to me, can you explain?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was confusing. The sub-indices are those of the uploader, I renamed them as uploader_ranks, which is just an int.

)

@staticmethod
def worker_init_fn(worker_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the user specifying number of workers?

Does this spawn threads or processes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are handled by the super-class torch.DataLoader and corresponding settings.

from os import environ
from time import sleep
import numpy as np
from mpi4py import MPI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make sure we have notes in here that specify you need MPI4Py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it in the Jupyter notebook, should be enough, but I'm open to suggestions.

@MattToast MattToast self-requested a review January 20, 2022 17:09
Copy link
Member

@MattToast MattToast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supper cool looking stuff!! Here's some quick notes for consideration:

  • I'm gonna second shortening up the docstrings to essential information. As someone coming from the outside, it took me while to try and figure out how the sub-classes differed while searching through the boilerplate.
  • In docstrings, multi-line bullet points should end in \ when moving to next line
  • Found a couple of minor errors in the docs.
  • Marked some places in where I thought the code-style seemed a bit off. Address as you see fit.

and one application (the ``training_service``) downloading the samples to train a DNN.

A richer example, entirely implemented in Python, is available as a Jupyter Notebook in the
``tutorials`` section of the SmartSim repository.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link for reader convenience?

doc/tutorials/machine_learning.rst Outdated Show resolved Hide resolved
doc/tutorials/machine_learning.rst Outdated Show resolved Hide resolved
from smartsim.settings import RunSettings

db = Orchestrator(port=6780)
exp = ("online-training", launcher="local")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exp = Experiment("online-training", launcher="local")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

producer_script = "producer.py"
settings = RunSettings("python", exe_args=producer_script)
uploader_model = exp.create_model("producer", settings)
uploader_model.attach_generator_files(to_copy=script)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_copy=producer_script

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

else:
sources = []

print(f"{worker_id}: {sources}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with logger?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this was too verbose anyhow. I removed it.

@@ -0,0 +1,40 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np never accessed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -0,0 +1,55 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np not accessed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

tutorials/04_ml_training/torch/training_service_hvd.py Outdated Show resolved Hide resolved
dataset.init_sources()
overall_sources = dataset.sources

worker_id = worker_info.id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the worker_id param used before being overwritten here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defer to the Torch docs I took this from. My best guess is that it is not.

@MattToast MattToast mentioned this pull request Jan 25, 2022
al-rigazzi and others added 20 commits January 31, 2022 05:22
Adds various dimensions to the CI build matrix
for SmartSim. The build matrix now uses MacOS
& Ubuntu, GNU8, RedisAI 1.2.3 & 1.2.5, and
Python 3.7-3.9. The build matrix excludes building
with RedisAI 1.2.5 when on MacOS as RedisAI
temporarily removed support for MacOS in 1.2.4
and 1.2.5

[ committed by @EricGustin and @Spartee ]
[ reviewed by @Spartee ]
* Remove np from step.py and requirements

Create a helper function called get_base_36_repr so that we can remove numpy from step.py

Remove numpy from requirements.txt

Pin requirements.dev to a specific version of numpy

* Remove numpy from requirements

* Remove numpy from setup

[ committed by @EricGustin ]
[ reviewed by @Spartee ]
Edits to the Tutorials section of the
documentation to highlight refined api

[ committed by @MattToast ]
[ reviewed by @Spartee ]
This PR adds the ability to build the documentation
inside a docker build so that users don't have to worry
about the specific build steps. The build exports the
docs folder into the current clone so that the dev
 doesn't have to do it themselves. build with
``make docks``. The manual documentation
build is still available with make docs.

There is also a new make tutorials that builds
the tutorials into a developer docker with the
current clone of SmartSim. This will eventually
have a prod docker build too whch downloads the
pinned version of smartsim for the latest release.

[ committed by @Spartee ]
[ reviewed by @al-rigazzi ]
Remove some tutorial files that appear
to have been duplicated.

[ committed by @MattToast ]
[ reviewed by @Spartee ]
Edits to the Tutorials section of the
documentation to highlight refined api

[ committed by @MattToast ]
[ reviewed by @Spartee ]
This PR adds the ability to build the documentation
inside a docker build so that users don't have to worry
about the specific build steps. The build exports the
docs folder into the current clone so that the dev
 doesn't have to do it themselves. build with
``make docks``. The manual documentation
build is still available with make docs.

There is also a new make tutorials that builds
the tutorials into a developer docker with the
current clone of SmartSim. This will eventually
have a prod docker build too whch downloads the
pinned version of smartsim for the latest release.

[ committed by @Spartee ]
[ reviewed by @al-rigazzi ]
Copy link
Contributor

@Spartee Spartee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work! Excited to see the propagate into the community.

@al-rigazzi al-rigazzi merged commit e9210d4 into CrayLabs:develop Feb 3, 2022
@al-rigazzi al-rigazzi deleted the trainer branch February 10, 2022 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ML Issues related to SmartSim ML classes and utilities type: feature Issues that include feature request or feature idea
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants