Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ElasticSampler and PyTorch Elastic ImageNet example #2297

Merged
merged 20 commits into from
Sep 22, 2020

Conversation

tgaddair
Copy link
Collaborator

Fixes #2252.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
@github-actions
Copy link

Unit Test Results

0 files  0 suites   0s ⏱️
0 tests 0 ✔️ 0 💤 0 ✖️

results for commit 672862b

Signed-off-by: Travis Addair <taddair@uber.com>
@github-actions
Copy link

Unit Test Results

0 files  0 suites   0s ⏱️
0 tests 0 ✔️ 0 💤 0 ✖️

results for commit 61582f6

Signed-off-by: Travis Addair <taddair@uber.com>
@github-actions
Copy link

Unit Test Results

0 files  0 suites   0s ⏱️
0 tests 0 ✔️ 0 💤 0 ✖️

results for commit 313d604

Signed-off-by: Travis Addair <taddair@uber.com>
@github-actions
Copy link

Unit Test Results

0 files  0 suites   0s ⏱️
0 tests 0 ✔️ 0 💤 0 ✖️

results for commit 4d82fda

@tgaddair tgaddair marked this pull request as ready for review September 22, 2020 17:01
# Training settings
parser = argparse.ArgumentParser(description='Elastic PyTorch ImageNet Example',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--train-dir', default=os.path.expanduser('~/imagenet/train'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general improvement for future: I think it would be nice to have a script to download and prepare imagenet data for the examples. From personal experience of trying to run example on other repos, it is sometimes painful to get the data the way the example expects it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. At least instructions to do so. Finding and downloading ImageNet is a pain.

self.rank = rank()

# Exclude any samples we have already processed this epoch
self.remaining_indices = [idx for idx in range(len(self.dataset))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: It might be more efficient to do a Set of remaining indices - Set of processed indices instead of iterating over the entire list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we need to preserve the order of remaining_indices so that it is deterministic across order, which we could not guarantee using a set.

@abditag2
Copy link
Collaborator

Generally, this looks good to me. So, ElasticSampler works only with datasets that are entirely on the disk and not with packages like Petastorm where we are streaming the data in. Is that correct?

@tgaddair
Copy link
Collaborator Author

Generally, this looks good to me. So, ElasticSampler works only with datasets that are entirely on the disk and not with packages like Petastorm where we are streaming the data in. Is that correct?

Any dataset that can be randomly indexed. Parquet is not particularly well-suited to this approach because of its row group storage format, but it could work in theory. Though in practice, we will need to do something else for Petastorm.

@github-actions
Copy link

Unit Test Results

   457 files  +    7     457 suites  +7   4h 29m 15s ⏱️ - 9m 16s
   618 tests +    1     573 ✔️ ±    0       44 💤 ±  0  1 ✖️ +1 
8 803 runs  -464  7 579 ✔️ -387  1 223 💤 -78  1 ✖️ +1 

results for commit 318cc26 ± comparison against base commit 41b8152

@tgaddair tgaddair merged commit 32e5fdb into master Sep 22, 2020
@tgaddair tgaddair deleted the elastic-sampler branch September 22, 2020 22:01
@github-actions

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Add some demo of elastic horovod on real dataset.
2 participants