Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download and Kinetics 400/600/700 Datasets #3680

Merged
merged 72 commits into from
Jun 10, 2021

Conversation

bjuncek
Copy link
Contributor

@bjuncek bjuncek commented Apr 16, 2021

Addresses #3452.

@bjuncek
Copy link
Contributor Author

bjuncek commented Apr 16, 2021

@pmeier is there a nice/neat way to read in CSV annotation file and have it be queryable without using pandas?

@pmeier pmeier changed the title #3452 Download and Kinetics 6/700 Datasets #3452 Download and Kinetics 400/600/700 Datasets Apr 19, 2021
Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Bruno, thanks for the PR! I did an initial pass. Comments below.

torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
bjuncek and others added 9 commits April 20, 2021 14:48
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
@bjuncek
Copy link
Contributor Author

bjuncek commented Apr 22, 2021

@pmeier I've fixed all the issues you mentioned, plus removed pandas dependency. Can you please check if all looks well.
Also, additionally, should we add a download test?

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly style related comments left.

Also, additionally, should we add a download test?

Yes, please. If you need help to set them up, please let me know. Happy to help.

Plus, since this adds new parameters (n_classes and split), we should also include them in our dataset test

class Kinetics400TestCase(datasets_utils.VideoDatasetTestCase):

You can do so by adding a ADDITIONAL_CONFIGS class variable like

ADDITIONAL_CONFIGS = datasets_utils.combinations_grid(
        n_classes=("400", "600", "700"), split=("train", "val")
    )

torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Outdated Show resolved Hide resolved
torchvision/datasets/kinetics.py Show resolved Hide resolved
Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, thanks Bruno!

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks Bruno!

I only have one comment regarding the order of the dimensions for a frame. If we are to be deprecating Kinetics400, I think we might as well change the format to perform a .transpose() if we use the legacy format

torchvision/datasets/kinetics.py Show resolved Hide resolved
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@fmassa fmassa merged commit 8ea04d1 into pytorch:master Jun 10, 2021
facebook-github-bot pushed a commit that referenced this pull request Jun 14, 2021
Summary:
* Initial commit

* pmeiers comments

* pmeiers changes

* pmeiers comments

* replace pandas with system library to avoid crashes

* Lint

* Lint

* fixing unittest

* Minor comments removal

* pmeier comments

* remove asserts

* address pmeier formatting changes

* address pmeier changes

* pmeier changes

* rename n_classes to num_classes

* formatting changes

* doc change to add ".mp4" to backported class

* formatting to correct line length

* adding **kwargs to Kinetics400 class

* remove urlib request and download the file directly

* annotations and files can be already downloaded

* test fix

* add download tests for Kinetics

* users now dont need to provide full path within the root for new Kinetics dataset

* linter

* Update test/test_datasets_download.py

* Update torchvision/datasets/kinetics.py

* revert whitespace (3680#discussion_r626382842)

* addressing annotation_path parameter which is unnecessary

* Update torchvision/datasets/kinetics.py

* Update torchvision/datasets/kinetics.py

* kwargs update

* expose num_download_workers as public

* swap os.isfile with check_integrity

* nit on private things

* special case if there are no default arguments

* revert changes to kinetics400 test case for BC

* add split_folder changes and support for legacy format

* pmeiers suggestions

* pmeiers suggestions - root comment

* pmeiers comments - annotation attribute remmoved

* pmeiers suggestion

* pmeiers suggestion

* pmeiers suggestion

* pmeiers suggestion

* Update torchvision/datasets/kinetics.py

* Update torchvision/datasets/kinetics.py

* Update torchvision/datasets/kinetics.py

* Update torchvision/datasets/kinetics.py

* Update torchvision/datasets/kinetics.py

* Update torchvision/datasets/kinetics.py

* minor debugging

* nit picks

* only include public kwargs into defaults

* add _use_legacy_structure in favour of **kwargs

* add type hints for Kinetics400

* flake8

* flake8

* flake8

* rename to make thigs clearer

* permuting the output

Reviewed By: fmassa

Differential Revision: D29097736

fbshipit-source-id: 1de2119e82eadbbba682f0897f03aba3929e3604

Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants