[RFC] Improve Transforms testing codebase #2881

vfdev-5 · 2020-10-23T17:23:01Z

🚀 Feature

Current way of transformations testing is not satisfactory due random configurations, duplicated code and unclear structure (test_functional_tensor.py, test_transforms.py, test_transforms_tensor.py). This feature request proposes to rewrite transforms tests in order to takle these problems.

Motivation

Structured approach for transforms tests provides :

better coverage
simpler way to add tests for new transforms
extension for other transforms which could cover more input datatypes (images, masks, bboxes, etc)

Pitch

What do we need:

Tests for functional operations (torchvision.transforms.functional)
- code coverage checks for incorrect input
- representative and deterministic operation configs to cover documented API
- all possible input types:
  - PIL, accimage?, torch.Tensor, batch of tensors
  - tensor's device: CPU/CUDA
  - tensor dtype: intergrals and floats + half on CUDA
- ensure that same works for torchscripted transform
- check result correctness vs stored or precomputed reference
- consistency check for results: torchscripted, tensor and PIL
Tests for Transforms (torchvision.transforms)
- code coverage checks for incorrect input
- representative and deterministic transform configs to cover documented API
- tests for generated random parameters
- ensure that same works for torchscripted transform

How to do that:

Limitations:

pytest.parametrize can not be used due to certain internal reasons.

1) Inspiration from "Simplify and organize test_ops" PR

Common part of testing a transformation we can defined in special class and derived classes could configure input type etc.
Example from the referenced PR:

class RoIOpTester:
    # Defines functional tests to execute
    # on cpu, on cuda, other options etc

class RoIPoolTester(RoIOpTester, unittest.TestCase):

    def fn(self, *args, **kwarsg):
        # function to test

    def get_script_fn(self, *agrs, **kwargs):
        # scripted function

     def expected_fn(self, *agrs, **kwargs):
        # reference function

2) Use `torch.testing._internal`

from torch.testing._internal.common_utils import TestCase, run_tests
from torch.testing._internal.common_device_type import dtypes, dtypesIfCUDA, instantiate_device_type_tests
from torch.testing import floating_types, floating_types_and_half, integral_types


class Tester(TestCase):

    @dtypes(*(floating_types() + integral_types()))
    @dtypesIfCUDA(*(floating_types_and_half() + integral_types()))
    def test_resize(self, device, dtype):
        img = self.create_image(h=12, w=16, device=device, dtype=dtype)
        ...


instantiate_device_type_tests(Tester, globals())

if __name__ == '__main__':
    run_tests()

this gives

TesterCPU::test_resize_cpu_float32 PASSED [ 14%]
TesterCPU::test_resize_cpu_float64 PASSED [ 28%]
TesterCPU::test_resize_cpu_int16 PASSED   [ 42%]
TesterCPU::test_resize_cpu_int32 PASSED   [ 57%]
TesterCPU::test_resize_cpu_int64 PASSED   [ 71%]
TesterCPU::test_resize_cpu_int8 PASSED    [ 85%]
TesterCPU::test_resize_cpu_uint8 PASSED   [100%]

Problems:

dtypes is perfect for torch.Tensor and no simple way to add PIL as dtype

3) Parametrized

Packaged looks promising and could potentially solve the limitation of pytest (to confirm by fb).

import unittest

from torch.testing import floating_types, integral_types
from parameterized import parameterized


class Tester(unittest.TestCase):

   @parameterized.expand(
       [("cuda", dt) for dt in floating_types() + integral_types()] +
       [("cpu", dt) for dt in floating_types() + integral_types()]
   )
   def test_resize(self, device, dtype):
       pass


if __name__ == "__main__":
    unittest.main()

this gives

TestMathUnitTest::test_resize_00_cuda PASSED  [  7%]
TestMathUnitTest::test_resize_01_cuda PASSED  [ 14%]
TestMathUnitTest::test_resize_02_cuda PASSED  [ 21%]
TestMathUnitTest::test_resize_03_cuda PASSED  [ 28%]
TestMathUnitTest::test_resize_04_cuda PASSED  [ 35%]
TestMathUnitTest::test_resize_05_cuda PASSED  [ 42%]
TestMathUnitTest::test_resize_06_cuda PASSED  [ 50%]
TestMathUnitTest::test_resize_07_cpu PASSED   [ 57%]
TestMathUnitTest::test_resize_08_cpu PASSED   [ 64%]
TestMathUnitTest::test_resize_09_cpu PASSED   [ 71%]
TestMathUnitTest::test_resize_10_cpu PASSED   [ 78%]
TestMathUnitTest::test_resize_11_cpu PASSED   [ 85%]
TestMathUnitTest::test_resize_12_cpu PASSED   [ 92%]
TestMathUnitTest::test_resize_13_cpu PASSED   [100%]

Problems:

project's adoption and maintainance: last commit in Apr 2020

4) Another approach inspired from torchaudio tests

Split test into 3 files, for example

torchscript_consistency_cpu_test.py : defines classes for different dtypes and device as CPU
torchscript_consistency_cuda_test.py : defines classes for different dtypes and device as CUDA
torchscript_consistency_impl.py : defines Functional class for tests

We can similarly put a file for PIL input, e.g. torchscript_consistency_pil_test.py

Open questions

How to do operation's configuration injection ?

img = ...
ref_fn = ...

for config in test_configs:
    output = fn(img, **config)
    true_output = ref_fn(img, **config)
    self.assertEqual(output, true_output)

Additional context

Recent bugs (e.g #2869) show unsatisfactory code coverage for transforms.

cc @vfdev-5 @fmassa @datumbox @mthrok

The text was updated successfully, but these errors were encountered:

fmassa · 2020-10-27T10:29:05Z

Thanks for the detailed issue!

A few thoughts:

check result correctness vs stored or precomputed reference

For correctness check, I think it would be preferable if possible to avoid having stored files with expected results, as this is very opaque and might make it more difficult to make modifications or improvements.
My first idea would be to have have some simple constructed images that we can test for correctness. For example, if we have a function like draw_circle_at or draw_rectangle_at that draws a simple white shape on a black background, we can by definition have the ground-truth for many operations via some math.

accimage?

I think accimage is less of a priority. It's not maintained and we can probably integrate any potential optimization that is present there with ipp either in torchvision or in PyTorch (only exposed ops are resize and flip, and we should benchmark to see how ipp fares compared to the current CPU implementations in PyTorch).

How to do that?

I think I might be leaning towards the first two options, with maybe a preference for option 2.

Wrt the limitations of dtypes, its implementation in PyTorch is fairly small and we could most probably adapt it to our own needs.

How to do operation's configuration injection ?

I think and approach similar to the custom @dtypes would work here, and is a subset of what parametrize does. So you could do

@configurations([1, 2, 3])
def test_1(self, config):
    output = fn(**config)
    ...

datumbox · 2020-10-27T13:17:52Z

+1 on using parameterized or a similar solution that supports annotations.

Right now we often use complex loops/if-statements in unit-testing. This is generally considered a bad practice and it makes it very difficult for us to debug/pinpoint problems.

Here is an example of a test that I wrote in the past that has this issue:

vision/test/test_ops.py

Lines 531 to 533 in 005355b

    
           def _test_forward(self, device, contiguous): 
        
               for batch_sz in [0, 33]: 
        
                   self._test_forward_with_batchsize(device, contiguous, batch_sz)

It became quite hard to pinpoint where the problem was when in failed in a recent PR.

mthrok · 2020-10-27T15:04:59Z

I love the direction this RFC is going.

One thing to note regarding the split of devices. The reason why I have decided to split GPU/CUDA tests of torchaudio into separate files is fbcode. CUDA tests cannot run in fbcode, and skipping them constantly will trigger alerts. There is no way in fbcode to disable such alarm because from the viewpoint of fbcode, there is no point of having a test that is constantly skipped. PyTorch-core considers it fine to receive alerts and decided to ignore them (or probably because it's too much work to do such migration), but since torchaudio is small and new, I decided to split them into separate files and run only CPU tests in fbcode.

I do no know if the same thing can be accomplished with PyTorch's test utilities. I did not play around with it.

vfdev-5 · 2020-10-30T16:02:32Z

FYI: Using torch.testing._internal to support PIL and configurations would require to recode existing classes and methods:

To enable PIL modes support in addition to existing tensor device dtype.
- We need to introduce a class vision_dtypes derived from torch.testing._internal.common_device_type.dtypes (easy)

code and usage

import torch
from torch.testing._internal.common_device_type import dtypes


class vision_dtypes(dtypes):

    def __init__(self, *args, **kwargs):
        super().__init__()
        if len(args) > 0 and isinstance(args[0], (list, tuple)):
            for arg in args:
                assert isinstance(arg, (list, tuple)), \
                    "When one dtype variant is a tuple or list, " \
                    "all dtype variants must be. " \
                    "Received non-list non-tuple dtype {0}".format(str(arg))
                assert all(self._is_supported_dtype(dtype) for dtype in arg), "Unknown dtype in {0}".format(str(arg))
        else:
            assert all(self._is_supported_dtype(arg) for arg in args), "Unknown dtype in {0}".format(str(args))

        self.args = args
        self.device_type = kwargs.get('device_type', 'all')

    @staticmethod
    def _is_supported_dtype(dtype):
        if isinstance(dtype, torch.dtype):
            return True
        if isinstance(dtype, str):
            # https://pillow.readthedocs.io/en/stable/handbook/concepts.html?highlight=modes#modes
            modes = ["1", "L", "P", "RGB", "RGBA", "CMYK", "YCbCr", "LAB", "HSV", "I", "F"]
            return dtype in ["PIL.{}".format(mode) for mode in modes]
        return False

and this can be used as

class Tester(TestCase):

    @vision_dtypes(*(floating_types() + integral_types() + ("PIL.RGB", "PIL.P", "PIL.F")))
    @dtypesIfCUDA(*(floating_types_and_half() + integral_types()))
    def test_resize(self, device, dtype, config=None):
        # img = self.create_image(h=12, w=16, device=device, dtype=dtype)
        pass


instantiate_device_type_tests(Tester, globals())

To enable @configurations decorator, we have to copy and adapt instantiate_device_type_tests method and rewrite DeviceTypeTestBase.instantiate_test method to handle configurations similarly to how it is done for @ops:
https://github.com/pytorch/pytorch/blob/7df0224cba8b0f6aa31a6893f87df6f12a0ec3aa/torch/testing/_internal/common_device_type.py#L284 and https://github.com/pytorch/pytorch/blob/7df0224cba8b0f6aa31a6893f87df6f12a0ec3aa/torch/testing/_internal/common_device_type.py#L476 (will be a lot of copy/paste code)

vfdev-5 · 2020-11-04T15:05:16Z

After discussing with @fmassa, we can propose a change to torch.testing._internal in order to enable a more flexible approach: a) user can easily add "custom" dtypes (e.g. like cpu PIL modes for torchvision), b) user can easily create decorators (something like @ops) to insert a parameterization/configuration as arg to the test.
Next step is to create a prototype of new common_device_type.py keeping BC and covering above features.

cc @mruberry @mthrok

mruberry · 2020-11-04T17:36:15Z

@vfdev-5 Would you just like an @variants(iterable of (str, obj) pairs) decorator? That instantiated subtests with names from the string and passed the values in obj to those tests?

vfdev-5 · 2020-11-04T21:42:12Z

@mruberry yes, something like that would be perfect. The usage will be like that

class Tester(TestCase):

    @variants([
         ("nearest_single_value1", {"size": 32, "interpolation": 0}),
         ("nearest_single_value2", {"size": 25, "interpolation": 0}),
         ("linear_single_value1", {"size": 32, "interpolation": 2}),
         ("linear_single_value2", {"size": 25, "interpolation": 2}),
     ])
    @vision_dtypes(*(floating_types() + integral_types() + ("PIL.RGB", "PIL.P", "PIL.F")))
    @dtypesIfCUDA(*(floating_types_and_half() + integral_types()))
    def test_resize(self, device, dtype, config):
        # img = self.create_image(h=12, w=16, device=device, dtype=dtype)
        # config = {"size": X, "interpolation": Y}
        pass


# instantiate_device_type_tests(Tester, globals())
instantiate_tests(Tester, globals())

> 
TesterCPU::test_resize_cpu_float32_nearest_single_value1
TesterCPU::test_resize_cpu_float32_nearest_single_value2
TesterCPU::test_resize_cpu_float32_linear_single_value1
TesterCPU::test_resize_cpu_float32_linear_single_value2

mruberry · 2020-11-04T22:16:31Z

Instead of being passed in "config" how do you feel about taking kwargs so the variant params can get splatted?

vfdev-5 · 2020-11-04T22:26:10Z

Instead of being passed in "config" how do you feel about taking kwargs so the variant params can get splatted?

I think this could also work.

Actually, I've been thinking that it would be nice that something generic like variants could be used for any "cross-product" combinations (without restrictions on using variants decorator only once):

class Tester(TestCase):

    @variants("interpolation", [0, 2, 3])
    @variants("size", [32, 26, (32, 26), [26, 32]])
    @variants("dtype", [floating_types() + integral_types() + ...])
    @variants("device", ["cpu", "cuda"])
    def test_resize(self, interpolation, size, dtype, device):
        # img = self.create_image(h=12, w=16, device=device, dtype=dtype)
        pass


ParametrizedTester::test_resize_cpu_float32_size_0_interpolation_0
ParametrizedTester::test_resize_cpu_float32_size_0_interpolation_1
ParametrizedTester::test_resize_cpu_float32_size_0_interpolation_2
ParametrizedTester::test_resize_cpu_float32_size_1_interpolation_0
ParametrizedTester::test_resize_cpu_float32_size_1_interpolation_1
...

We can still use @dtypes for device, data type configuration, but having multiple @variants could be nice.
What do you think ?

mruberry · 2020-11-06T05:18:41Z

I agree that would be nice. I'm going to propose updates to PyTorch's test tools soon and they'll include a decorator like parameterize or variants or subtest functionality. Note that a workaround today is just to write a for loop that enumerates the cross-product of these arguments in the test itself.

vfdev-5 · 2020-11-06T09:25:46Z

I'm going to propose updates to PyTorch's test tools soon and they'll include a decorator like parameterize or variants or subtest functionality.

@mruberry sounds great ! Also, it would be nice to have related classes and methods more configurable. For example, if we would like to derive from dtypes, it would be nice to refactor type verification and have a possibility to override it:

# Changed dtypes
class dtypes(object):

    # Note: *args, **kwargs for Python2 compat.
    # Python 3 allows (self, *args, device_type='all').
    def __init__(self, *args, **kwargs):
        if len(args) > 0 and isinstance(args[0], (list, tuple)):
            for arg in args:
                assert isinstance(arg, (list, tuple)), \
                    "When one dtype variant is a tuple or list, " \
                    "all dtype variants must be. " \
                    "Received non-list non-tuple dtype {0}".format(str(arg))
                # This is the change vs original dtypes implementation
                assert all(self._is_supported_dtype(dtype) for dtype in arg), "Unknown dtype in {0}".format(str(arg))
        else:
            # This is the change vs original dtypes implementation
            assert all(self._is_supported_dtype(arg) for arg in args), "Unknown dtype in {0}".format(str(args))

        self.args = args
        self.device_type = kwargs.get('device_type', 'all')

    def __call__(self, fn):
        d = getattr(fn, 'dtypes', {})
        assert self.device_type not in d, "dtypes redefinition for {0}".format(self.device_type)
        d[self.device_type] = self.args
        fn.dtypes = d
        return fn

    # This is the change vs original dtypes implementation
    @staticmethod
    def _is_supported_dtype(dtype):
        if isinstance(dtype, torch.dtype):
            return True
        return False

I was also thinking to implement variants decorator and propose certain refactorization changes for dtypes, DeviceTypeTestBase and instantiate_device_type_tests. Let me know if it could overlap with your planned updates for PyTorch's test tools.

mruberry · 2020-11-06T09:39:26Z

Changes like that don't need to go through a special planning process, although I am curious if @dtypes is really the best place for a change like that and how to use the new functionality would need to be documented. It's too bad that subclassing torch.dtype isn't possible.

datumbox · 2021-06-02T10:04:00Z

@NicolasHug @vfdev-5 It seems to me that some of the changes described on this ticket are now on master. To what degree do you think we covered this ticket?

vfdev-5 · 2021-06-02T10:17:02Z

@datumbox if we can go with pytest what was not the case on the moment of creating this RFC so we can close this issue.

NicolasHug · 2021-06-02T10:19:55Z

I started opening issues about porting the tests to pytest #3915 #3914 #3909.

I'm doing it on a per-file basis because they're all quite different. I'll get to the transforms test soon(ish?).

EDIT: just opened #3945

vfdev-5 added needs discussion module: tests module: transforms labels Oct 23, 2020

NicolasHug closed this as completed Jun 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Improve Transforms testing codebase #2881

[RFC] Improve Transforms testing codebase #2881

vfdev-5 commented Oct 23, 2020 •

edited

Loading

fmassa commented Oct 27, 2020

datumbox commented Oct 27, 2020

mthrok commented Oct 27, 2020

vfdev-5 commented Oct 30, 2020 •

edited

Loading

vfdev-5 commented Nov 4, 2020

mruberry commented Nov 4, 2020 •

edited

Loading

vfdev-5 commented Nov 4, 2020 •

edited

Loading

mruberry commented Nov 4, 2020

vfdev-5 commented Nov 4, 2020 •

edited

Loading

mruberry commented Nov 6, 2020

vfdev-5 commented Nov 6, 2020 •

edited

Loading

mruberry commented Nov 6, 2020

datumbox commented Jun 2, 2021

vfdev-5 commented Jun 2, 2021

NicolasHug commented Jun 2, 2021 •

edited

Loading

[RFC] Improve Transforms testing codebase #2881

[RFC] Improve Transforms testing codebase #2881

Comments

vfdev-5 commented Oct 23, 2020 • edited Loading

🚀 Feature

Motivation

Pitch

What do we need:

How to do that:

Limitations:

1) Inspiration from "Simplify and organize test_ops" PR

2) Use torch.testing._internal

3) Parametrized

4) Another approach inspired from torchaudio tests

Open questions

Additional context

fmassa commented Oct 27, 2020

datumbox commented Oct 27, 2020

mthrok commented Oct 27, 2020

vfdev-5 commented Oct 30, 2020 • edited Loading

vfdev-5 commented Nov 4, 2020

mruberry commented Nov 4, 2020 • edited Loading

vfdev-5 commented Nov 4, 2020 • edited Loading

mruberry commented Nov 4, 2020

vfdev-5 commented Nov 4, 2020 • edited Loading

mruberry commented Nov 6, 2020

vfdev-5 commented Nov 6, 2020 • edited Loading

mruberry commented Nov 6, 2020

datumbox commented Jun 2, 2021

vfdev-5 commented Jun 2, 2021

NicolasHug commented Jun 2, 2021 • edited Loading

vfdev-5 commented Oct 23, 2020 •

edited

Loading

2) Use `torch.testing._internal`

vfdev-5 commented Oct 30, 2020 •

edited

Loading

mruberry commented Nov 4, 2020 •

edited

Loading

vfdev-5 commented Nov 4, 2020 •

edited

Loading

vfdev-5 commented Nov 4, 2020 •

edited

Loading

vfdev-5 commented Nov 6, 2020 •

edited

Loading

NicolasHug commented Jun 2, 2021 •

edited

Loading