-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[POC] Base class for dataset tests #3402
Conversation
11773da
to
e85f996
Compare
I suspect the tests fail on Windows, because |
Codecov Report
@@ Coverage Diff @@
## master #3402 +/- ##
==========================================
+ Coverage 74.83% 75.13% +0.30%
==========================================
Files 105 105
Lines 9722 9722
Branches 1563 1563
==========================================
+ Hits 7275 7305 +30
+ Misses 1960 1930 -30
Partials 487 487
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
I have a few comments, specially about the handling of PIL opened images on Windows, let me know what you think.
About your questions:
1 - I might be missing something here, but why not leave those to be available in the config
?
2 - Let's leave transform testing out for now. There might be a way to test that the transform is called without having to let the user specify it. Maybe by creating dummy transformer classes that return a specific dummy value, and checking the presence of this dummy value in the returned dataset.
I think that would be only reasonable if there is a good and independent default value: vision/torchvision/datasets/mnist.py Line 263 in c1f85d3
The datasets I mentioned require an additional directory where the assumptions cannot be met. The annotation directory should be inside temporary directory which is only created shortly before vision/torchvision/datasets/ucf101.py Line 58 in c1f85d3
Thus if the annotation directory is inside In most cases the temporary directory passed to
So instead of checking if the returned values are correct you only want to check if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Summary: * add base class for datasets tests * add better type hints * add documentation to subclasses * add utility functions to create files / folders of random images and videos * fix imports * remove class properties * fix smoke test * fix type hints * fix random size generation * add Caltech256 as example * add utility function to create grid of combinations * add CIFAR100? as example * lint * add missing import * improve documentation * create 1 frame videos by default * remove obsolete check * return path of files created with utility functions * [test] close PIL file handles before deletion * fix video folder creation * generalize file handle closing * fix lazy imports * add test for transforms * fix explanation comment * lint * force load opened PIL images * lint * copy default config to avoid inplace modification * enable additional arg forwarding Reviewed By: NicolasHug Differential Revision: D26605320 fbshipit-source-id: c54903ea5a6b57ddfa57c560e579e0834cf0cc11
Proof-of-concept for #3375.
We provide the fully-documented
DatasetTestCase
class that needs to be subclassed for each individual dataset. In most cases the user only has to configure theDATASET_CLASS
to indicate which dataset should be tested and supply fake data within theinject_fake_data
method. I've added tests forCaltech256
,CIFAR10
, andCIFAR100
to showcase this.The user can add additional
CONFIGS
that are all tested instead of only the standard configuration. This can be used to test if the dataset works as intended for different splits or other parameter combinations.Two things we need to discuss:
Some datasets require additional arguments without default values. For example
vision/torchvision/datasets/ucf101.py
Line 46 in 61e00d5
vision/torchvision/datasets/coco.py
Lines 49 to 52 in 61e00d5
We need an interface for the user to be able to provide them. My idea was to put this into
inject_fake_data
. In addition to theinfo
dict, it could return a sequence of arguments passed to the constructor. If not returned, we could simply assume that onlyroot
should be passed, making this optional.As of now, no tests for the
(target_)transforms?
is implemented. Writing the test is trivial, but we need a way to indicate which transform is applied to which output argument. Most of the timetransform
is applied to the first,target_transform
to the second, andtransforms
to the first two. But there are exceptions to this:vision/torchvision/datasets/phototour.py
Lines 106 to 109 in 61e00d5
vision/torchvision/datasets/ucf101.py
Lines 108 to 111 in 61e00d5
My idea is to provide
TRANSFORM_IDCS
andTARGET_TRANSFORM_IDCS
class attributes (TRANSFORMS_IDCS
is not necessary as it can be combined from the other two). These would be mandatory, but could have good default values in(Image|Video)DatasetTestCase
.