-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a new low-memory approach for tf dataset index shuffling #5863
Merged
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
1d6fa5b
Use a new low-memory approach for tf dataset index shuffling
Rocketknight1 b05b748
correct fill kwarg
Rocketknight1 7f936cb
...and cast the inputs too
Rocketknight1 7bd7312
Add warnings for older TF
Rocketknight1 18d92aa
Fix to use the imported random_index_shuffle
Rocketknight1 82534e3
Switch to_tf_dataset entirely over to the NumPy multiprocessing approach
Rocketknight1 3011d62
Revert "Switch to_tf_dataset entirely over to the NumPy multiprocessi…
Rocketknight1 3c54400
Add explanatory comment
Rocketknight1 81761db
TF 2.13 has a specific optimization for
Rocketknight1 8907bdb
Fix a couple of rebase errors
Rocketknight1 f39ba76
More merging with the changes in main
Rocketknight1 323747a
Fix some indents
Rocketknight1 e8f051a
Fix docstring merge
Rocketknight1 5dfcd87
Add clearer TODO
Rocketknight1 b4cc3ee
Rename indices -> index to be clearer what the function does now
Rocketknight1 c14806a
Expand test to make sure shuffling is working correctly
Rocketknight1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Matt, is
datasets
going to drop Python 3.7 support due to its upcoming EOL? Because it will happen by the end of the month in case we want to wait and set the minimum version to 3.8, even though I assume some users may still be using 3.7?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will probably depend on what
transformers
does