Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Improve Transforms testing codebase #2881

Closed
vfdev-5 opened this issue Oct 23, 2020 · 15 comments
Closed

[RFC] Improve Transforms testing codebase #2881

vfdev-5 opened this issue Oct 23, 2020 · 15 comments

Comments

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 23, 2020

🚀 Feature

Current way of transformations testing is not satisfactory due random configurations, duplicated code and unclear structure (test_functional_tensor.py, test_transforms.py, test_transforms_tensor.py). This feature request proposes to rewrite transforms tests in order to takle these problems.

Motivation

Structured approach for transforms tests provides :

  • better coverage
  • simpler way to add tests for new transforms
  • extension for other transforms which could cover more input datatypes (images, masks, bboxes, etc)

Pitch

What do we need:

  • Tests for functional operations (torchvision.transforms.functional)
    • code coverage checks for incorrect input
    • representative and deterministic operation configs to cover documented API
    • all possible input types:
      • PIL, accimage?, torch.Tensor, batch of tensors
      • tensor's device: CPU/CUDA
      • tensor dtype: intergrals and floats + half on CUDA
    • ensure that same works for torchscripted transform
    • check result correctness vs stored or precomputed reference
    • consistency check for results: torchscripted, tensor and PIL
  • Tests for Transforms (torchvision.transforms)
    • code coverage checks for incorrect input
    • representative and deterministic transform configs to cover documented API
    • tests for generated random parameters
    • ensure that same works for torchscripted transform

How to do that:

Limitations:

  • pytest.parametrize can not be used due to certain internal reasons.

1) Inspiration from "Simplify and organize test_ops" PR

Common part of testing a transformation we can defined in special class and derived classes could configure input type etc.
Example from the referenced PR:

class RoIOpTester:
    # Defines functional tests to execute
    # on cpu, on cuda, other options etc

class RoIPoolTester(RoIOpTester, unittest.TestCase):

    def fn(self, *args, **kwarsg):
        # function to test

    def get_script_fn(self, *agrs, **kwargs):
        # scripted function

     def expected_fn(self, *agrs, **kwargs):
        # reference function

2) Use torch.testing._internal

from torch.testing._internal.common_utils import TestCase, run_tests
from torch.testing._internal.common_device_type import dtypes, dtypesIfCUDA, instantiate_device_type_tests
from torch.testing import floating_types, floating_types_and_half, integral_types


class Tester(TestCase):

    @dtypes(*(floating_types() + integral_types()))
    @dtypesIfCUDA(*(floating_types_and_half() + integral_types()))
    def test_resize(self, device, dtype):
        img = self.create_image(h=12, w=16, device=device, dtype=dtype)
        ...


instantiate_device_type_tests(Tester, globals())

if __name__ == '__main__':
    run_tests()

this gives

TesterCPU::test_resize_cpu_float32 PASSED [ 14%]
TesterCPU::test_resize_cpu_float64 PASSED [ 28%]
TesterCPU::test_resize_cpu_int16 PASSED   [ 42%]
TesterCPU::test_resize_cpu_int32 PASSED   [ 57%]
TesterCPU::test_resize_cpu_int64 PASSED   [ 71%]
TesterCPU::test_resize_cpu_int8 PASSED    [ 85%]
TesterCPU::test_resize_cpu_uint8 PASSED   [100%]

Problems:

  • dtypes is perfect for torch.Tensor and no simple way to add PIL as dtype

3) Parametrized

Packaged looks promising and could potentially solve the limitation of pytest (to confirm by fb).

import unittest

from torch.testing import floating_types, integral_types
from parameterized import parameterized


class Tester(unittest.TestCase):

   @parameterized.expand(
       [("cuda", dt) for dt in floating_types() + integral_types()] +
       [("cpu", dt) for dt in floating_types() + integral_types()]
   )
   def test_resize(self, device, dtype):
       pass


if __name__ == "__main__":
    unittest.main()

this gives

TestMathUnitTest::test_resize_00_cuda PASSED  [  7%]
TestMathUnitTest::test_resize_01_cuda PASSED  [ 14%]
TestMathUnitTest::test_resize_02_cuda PASSED  [ 21%]
TestMathUnitTest::test_resize_03_cuda PASSED  [ 28%]
TestMathUnitTest::test_resize_04_cuda PASSED  [ 35%]
TestMathUnitTest::test_resize_05_cuda PASSED  [ 42%]
TestMathUnitTest::test_resize_06_cuda PASSED  [ 50%]
TestMathUnitTest::test_resize_07_cpu PASSED   [ 57%]
TestMathUnitTest::test_resize_08_cpu PASSED   [ 64%]
TestMathUnitTest::test_resize_09_cpu PASSED   [ 71%]
TestMathUnitTest::test_resize_10_cpu PASSED   [ 78%]
TestMathUnitTest::test_resize_11_cpu PASSED   [ 85%]
TestMathUnitTest::test_resize_12_cpu PASSED   [ 92%]
TestMathUnitTest::test_resize_13_cpu PASSED   [100%]

Problems:

  • project's adoption and maintainance: last commit in Apr 2020

4) Another approach inspired from torchaudio tests

Split test into 3 files, for example

We can similarly put a file for PIL input, e.g. torchscript_consistency_pil_test.py

Open questions

  • How to do operation's configuration injection ?
img = ...
ref_fn = ...

for config in test_configs:
    output = fn(img, **config)
    true_output = ref_fn(img, **config)
    self.assertEqual(output, true_output)

Additional context

Recent bugs (e.g #2869) show unsatisfactory code coverage for transforms.

cc @vfdev-5 @fmassa @datumbox @mthrok

@fmassa
Copy link
Member

fmassa commented Oct 27, 2020

Thanks for the detailed issue!

A few thoughts:

check result correctness vs stored or precomputed reference

For correctness check, I think it would be preferable if possible to avoid having stored files with expected results, as this is very opaque and might make it more difficult to make modifications or improvements.
My first idea would be to have have some simple constructed images that we can test for correctness. For example, if we have a function like draw_circle_at or draw_rectangle_at that draws a simple white shape on a black background, we can by definition have the ground-truth for many operations via some math.

accimage?

I think accimage is less of a priority. It's not maintained and we can probably integrate any potential optimization that is present there with ipp either in torchvision or in PyTorch (only exposed ops are resize and flip, and we should benchmark to see how ipp fares compared to the current CPU implementations in PyTorch).

How to do that?

I think I might be leaning towards the first two options, with maybe a preference for option 2.

Wrt the limitations of dtypes, its implementation in PyTorch is fairly small and we could most probably adapt it to our own needs.

How to do operation's configuration injection ?

I think and approach similar to the custom @dtypes would work here, and is a subset of what parametrize does. So you could do

@configurations([1, 2, 3])
def test_1(self, config):
    output = fn(**config)
    ...

@datumbox
Copy link
Contributor

+1 on using parameterized or a similar solution that supports annotations.

Right now we often use complex loops/if-statements in unit-testing. This is generally considered a bad practice and it makes it very difficult for us to debug/pinpoint problems.

Here is an example of a test that I wrote in the past that has this issue:

vision/test/test_ops.py

Lines 531 to 533 in 005355b

def _test_forward(self, device, contiguous):
for batch_sz in [0, 33]:
self._test_forward_with_batchsize(device, contiguous, batch_sz)

It became quite hard to pinpoint where the problem was when in failed in a recent PR.

@mthrok
Copy link
Contributor

mthrok commented Oct 27, 2020

I love the direction this RFC is going.

One thing to note regarding the split of devices. The reason why I have decided to split GPU/CUDA tests of torchaudio into separate files is fbcode. CUDA tests cannot run in fbcode, and skipping them constantly will trigger alerts. There is no way in fbcode to disable such alarm because from the viewpoint of fbcode, there is no point of having a test that is constantly skipped. PyTorch-core considers it fine to receive alerts and decided to ignore them (or probably because it's too much work to do such migration), but since torchaudio is small and new, I decided to split them into separate files and run only CPU tests in fbcode.

I do no know if the same thing can be accomplished with PyTorch's test utilities. I did not play around with it.

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Oct 30, 2020

FYI: Using torch.testing._internal to support PIL and configurations would require to recode existing classes and methods:

  • To enable PIL modes support in addition to existing tensor device dtype.
    • We need to introduce a class vision_dtypes derived from torch.testing._internal.common_device_type.dtypes (easy)
code and usage
import torch
from torch.testing._internal.common_device_type import dtypes


class vision_dtypes(dtypes):

    def __init__(self, *args, **kwargs):
        super().__init__()
        if len(args) > 0 and isinstance(args[0], (list, tuple)):
            for arg in args:
                assert isinstance(arg, (list, tuple)), \
                    "When one dtype variant is a tuple or list, " \
                    "all dtype variants must be. " \
                    "Received non-list non-tuple dtype {0}".format(str(arg))
                assert all(self._is_supported_dtype(dtype) for dtype in arg), "Unknown dtype in {0}".format(str(arg))
        else:
            assert all(self._is_supported_dtype(arg) for arg in args), "Unknown dtype in {0}".format(str(args))

        self.args = args
        self.device_type = kwargs.get('device_type', 'all')

    @staticmethod
    def _is_supported_dtype(dtype):
        if isinstance(dtype, torch.dtype):
            return True
        if isinstance(dtype, str):
            # https://pillow.readthedocs.io/en/stable/handbook/concepts.html?highlight=modes#modes
            modes = ["1", "L", "P", "RGB", "RGBA", "CMYK", "YCbCr", "LAB", "HSV", "I", "F"]
            return dtype in ["PIL.{}".format(mode) for mode in modes]
        return False

and this can be used as

class Tester(TestCase):

    @vision_dtypes(*(floating_types() + integral_types() + ("PIL.RGB", "PIL.P", "PIL.F")))
    @dtypesIfCUDA(*(floating_types_and_half() + integral_types()))
    def test_resize(self, device, dtype, config=None):
        # img = self.create_image(h=12, w=16, device=device, dtype=dtype)
        pass


instantiate_device_type_tests(Tester, globals())

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Nov 4, 2020

After discussing with @fmassa, we can propose a change to torch.testing._internal in order to enable a more flexible approach: a) user can easily add "custom" dtypes (e.g. like cpu PIL modes for torchvision), b) user can easily create decorators (something like @ops) to insert a parameterization/configuration as arg to the test.
Next step is to create a prototype of new common_device_type.py keeping BC and covering above features.

cc @mruberry @mthrok

@mruberry
Copy link
Contributor

mruberry commented Nov 4, 2020

@vfdev-5 Would you just like an @variants(iterable of (str, obj) pairs) decorator? That instantiated subtests with names from the string and passed the values in obj to those tests?

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Nov 4, 2020

@mruberry yes, something like that would be perfect. The usage will be like that

class Tester(TestCase):

    @variants([
         ("nearest_single_value1", {"size": 32, "interpolation": 0}),
         ("nearest_single_value2", {"size": 25, "interpolation": 0}),
         ("linear_single_value1", {"size": 32, "interpolation": 2}),
         ("linear_single_value2", {"size": 25, "interpolation": 2}),
     ])
    @vision_dtypes(*(floating_types() + integral_types() + ("PIL.RGB", "PIL.P", "PIL.F")))
    @dtypesIfCUDA(*(floating_types_and_half() + integral_types()))
    def test_resize(self, device, dtype, config):
        # img = self.create_image(h=12, w=16, device=device, dtype=dtype)
        # config = {"size": X, "interpolation": Y}
        pass


# instantiate_device_type_tests(Tester, globals())
instantiate_tests(Tester, globals())

> 
TesterCPU::test_resize_cpu_float32_nearest_single_value1
TesterCPU::test_resize_cpu_float32_nearest_single_value2
TesterCPU::test_resize_cpu_float32_linear_single_value1
TesterCPU::test_resize_cpu_float32_linear_single_value2

@mruberry
Copy link
Contributor

mruberry commented Nov 4, 2020

Instead of being passed in "config" how do you feel about taking kwargs so the variant params can get splatted?

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Nov 4, 2020

Instead of being passed in "config" how do you feel about taking kwargs so the variant params can get splatted?

I think this could also work.

Actually, I've been thinking that it would be nice that something generic like variants could be used for any "cross-product" combinations (without restrictions on using variants decorator only once):

class Tester(TestCase):

    @variants("interpolation", [0, 2, 3])
    @variants("size", [32, 26, (32, 26), [26, 32]])
    @variants("dtype", [floating_types() + integral_types() + ...])
    @variants("device", ["cpu", "cuda"])
    def test_resize(self, interpolation, size, dtype, device):
        # img = self.create_image(h=12, w=16, device=device, dtype=dtype)
        pass


ParametrizedTester::test_resize_cpu_float32_size_0_interpolation_0
ParametrizedTester::test_resize_cpu_float32_size_0_interpolation_1
ParametrizedTester::test_resize_cpu_float32_size_0_interpolation_2
ParametrizedTester::test_resize_cpu_float32_size_1_interpolation_0
ParametrizedTester::test_resize_cpu_float32_size_1_interpolation_1
...

We can still use @dtypes for device, data type configuration, but having multiple @variants could be nice.
What do you think ?

@mruberry
Copy link
Contributor

mruberry commented Nov 6, 2020

I agree that would be nice. I'm going to propose updates to PyTorch's test tools soon and they'll include a decorator like parameterize or variants or subtest functionality. Note that a workaround today is just to write a for loop that enumerates the cross-product of these arguments in the test itself.

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Nov 6, 2020

I'm going to propose updates to PyTorch's test tools soon and they'll include a decorator like parameterize or variants or subtest functionality.

@mruberry sounds great ! Also, it would be nice to have related classes and methods more configurable. For example, if we would like to derive from dtypes, it would be nice to refactor type verification and have a possibility to override it:

# Changed dtypes
class dtypes(object):

    # Note: *args, **kwargs for Python2 compat.
    # Python 3 allows (self, *args, device_type='all').
    def __init__(self, *args, **kwargs):
        if len(args) > 0 and isinstance(args[0], (list, tuple)):
            for arg in args:
                assert isinstance(arg, (list, tuple)), \
                    "When one dtype variant is a tuple or list, " \
                    "all dtype variants must be. " \
                    "Received non-list non-tuple dtype {0}".format(str(arg))
                # This is the change vs original dtypes implementation
                assert all(self._is_supported_dtype(dtype) for dtype in arg), "Unknown dtype in {0}".format(str(arg))
        else:
            # This is the change vs original dtypes implementation
            assert all(self._is_supported_dtype(arg) for arg in args), "Unknown dtype in {0}".format(str(args))

        self.args = args
        self.device_type = kwargs.get('device_type', 'all')

    def __call__(self, fn):
        d = getattr(fn, 'dtypes', {})
        assert self.device_type not in d, "dtypes redefinition for {0}".format(self.device_type)
        d[self.device_type] = self.args
        fn.dtypes = d
        return fn

    # This is the change vs original dtypes implementation
    @staticmethod
    def _is_supported_dtype(dtype):
        if isinstance(dtype, torch.dtype):
            return True
        return False

I was also thinking to implement variants decorator and propose certain refactorization changes for dtypes, DeviceTypeTestBase and instantiate_device_type_tests. Let me know if it could overlap with your planned updates for PyTorch's test tools.

@mruberry
Copy link
Contributor

mruberry commented Nov 6, 2020

Changes like that don't need to go through a special planning process, although I am curious if @dtypes is really the best place for a change like that and how to use the new functionality would need to be documented. It's too bad that subclassing torch.dtype isn't possible.

@datumbox
Copy link
Contributor

datumbox commented Jun 2, 2021

@NicolasHug @vfdev-5 It seems to me that some of the changes described on this ticket are now on master. To what degree do you think we covered this ticket?

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Jun 2, 2021

@datumbox if we can go with pytest what was not the case on the moment of creating this RFC so we can close this issue.

@NicolasHug
Copy link
Member

NicolasHug commented Jun 2, 2021

I started opening issues about porting the tests to pytest #3915 #3914 #3909.

I'm doing it on a per-file basis because they're all quite different. I'll get to the transforms test soon(ish?).

EDIT: just opened #3945

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants