Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learned kernel MMD with KeOps backend #602

Merged
merged 18 commits into from
Sep 8, 2022

Conversation

arnaudvl
Copy link
Contributor

@arnaudvl arnaudvl commented Aug 26, 2022

Following #548 , also extending the learned kernel MMD detector with the KeOps backend to further scale and speed up the detector.

  • Add tests.
  • Check if additional test for LearnedKernelDrift is required given decreased coverage.
  • Fix typing.
  • Add docs, clarify that to optimally benefit from the speed ups, an initial bandwidth for kernel_a and optionally kernel_b needs to be set.
  • Extend MMD benchmarking example.
  • Agree on API of utils.keops.kernels.DeepKernel. The main issue here is that DeepKernel.proj is not used within the DeepKernel's forward pass, but explicitly called by the learned kernel drift detector. The reason is that we first need to do regular torch tensor computations using the projection, and then with those projected features compute the kernel matrix. KeOps is only used for the latter, not when computing the projection itself. So technically you could pass a separate projection (i.e. DeepKernel.proj) to the drift detector and apply a weighted sum kernel later on the original and projected data. But this would break the consistency of the API's/input kwargs with the other backends and realistically make the detector harder to understand. As a result, I chose to keep the same DeepKernel format as the PyTorch and TensorFlow backends and deal with this difference directly in the drift detector. This also means there are explicit checks in place (self.has_proj and self.has_kernel_b) to check if the DeepKernel format is used and we can do the projection separately.
  • Would like to add num_workers for both KeOps and PyTorch backends since it can make a significant difference for the dataloader speed for higher number of instances. Add num_workers to PyTorch/KeOps backend where relevant #611
  • There is an additional batch_size_predict kwarg which I would also like to add to the PyTorch backend. The reason is that the optimal batch size for training can be wildly different than that for prediction (where we just care about being as fast as possible within our compute budget). So if we e.g. pick batch_size=32 for training we might want to change this to batch_size_predict=1000 for prediction using the trained kernel. There is also another reason why they can be very different: during training of the detector the whole training batch (all tensors incl. the projection etc) needs to fit on the GPU. But for predictions across all permutations at once we can first compute all the projections separately, and then lazily evaluate both the projections and original instances for all permutations. This means we can likely get away with much higher batch sizes for the projection predictions. Add batch_size_predict as kwarg to PyTorch backend for learned detectors. #612
  • Follow up PR separating out the PyTorch and KeOps trainers. Merge PyTorch and KeOps trainers for the learned kernel drift detector. #613

The smaller potential PyTorch changes (num_workers and batch_size_predict) can be done in a quick follow up PR.

@arnaudvl arnaudvl added WIP PR is a Work in Progress DO NOT MERGE Do not merge this PR labels Aug 26, 2022
@arnaudvl arnaudvl requested review from jklaise and ascillitoe August 26, 2022 13:30
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@arnaudvl arnaudvl removed WIP PR is a Work in Progress DO NOT MERGE Do not merge this PR labels Sep 1, 2022
@codecov-commenter
Copy link

codecov-commenter commented Sep 2, 2022

Codecov Report

Merging #602 (36cb4aa) into master (0bbe586) will decrease coverage by 1.43%.
The diff coverage is 20.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #602      +/-   ##
==========================================
- Coverage   83.58%   82.14%   -1.44%     
==========================================
  Files         207      209       +2     
  Lines       13838    14159     +321     
==========================================
+ Hits        11566    11631      +65     
- Misses       2272     2528     +256     
Impacted Files Coverage Δ
alibi_detect/cd/keops/learned_kernel.py 0.00% <0.00%> (ø)
alibi_detect/tests/test_dep_management.py 18.36% <ø> (ø)
alibi_detect/utils/keops/__init__.py 0.00% <0.00%> (ø)
alibi_detect/utils/keops/kernels.py 0.00% <0.00%> (ø)
...ibi_detect/utils/keops/tests/test_kernels_keops.py 32.32% <31.11%> (+0.18%) ⬆️
...detect/cd/keops/tests/test_learned_kernel_keops.py 36.36% <36.36%> (ø)
alibi_detect/cd/tests/test_learned_kernel.py 83.82% <50.00%> (-12.26%) ⬇️
alibi_detect/cd/learned_kernel.py 95.65% <83.33%> (-4.35%) ⬇️
alibi_detect/saving/schemas.py 99.25% <100.00%> (+<0.01%) ⬆️
alibi_detect/cd/tabular.py 97.72% <0.00%> (+0.10%) ⬆️

Comment on lines +77 to +93
n_features = [5]
n_instances = [(100, 100), (100, 75)]
kernel_a = ['GaussianRBF', 'MyKernel']
kernel_b = ['GaussianRBF', 'MyKernel', None]
eps = [0.5, 'trainable']
tests_dk = list(product(n_features, n_instances, kernel_a, kernel_b, eps))
n_tests_dk = len(tests_dk)


@pytest.fixture
def deep_kernel_params(request):
return tests_dk[request.param]


@pytest.mark.skipif(not has_keops, reason='Skipping since pykeops is not installed.')
@pytest.mark.parametrize('deep_kernel_params', list(range(n_tests_dk)), indirect=True)
def test_deep_kernel(deep_kernel_params):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge fan of using this pattern to parametrizing tests, is there a reason not to parametrize each parameter directly? @ascillitoe

Copy link
Contributor

@ascillitoe ascillitoe Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh the more conventional way to do it would be to parametrize each parameter separately. e.g. see test_save_model in test_saving.py:

@parametrize('model', [encoder_model])
@parametrize('layer', [None, -1])
def test_save_model(data, model, layer, backend, tmp_path):

This approach has the advantage of giving much more descriptive test names which is useful when things go wrong.

We keep writing tests with the current pattern for consistency with existing tests. But, unless we are going to refactor existing tests very soon maybe we should prioritise adopting/exploring a new pattern so that we have less refactoring to do later...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I was not aware here tbh and followed existing patterns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +68 to +74
if has_keops:
class MyKernel(nn.Module):
def __init__(self):
super().__init__()

def forward(self, x: LazyTensor, y: LazyTensor) -> LazyTensor:
return (- ((x - y) ** 2).sum(-1)).exp()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is typing the only reason why this needs to be inside if has_keops block? If so we could just use forward-references to 'LazyTensor'? @ascillitoe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Looks like that might be the case...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the foward-ref not work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jklaise and I were chatting earlier and decided (correct me if I'm wrong) that it's not necessarily a better solution, just a bit different, so I kept it as is.

Copy link
Contributor

@ascillitoe ascillitoe Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK fair enough!

(I don't have a strong opinion either way)

Copy link
Contributor

@jklaise jklaise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Ok with proposed departure for the DeepKernel API for keops assuming DeepKernel is never really meant to be used by user directly?

@arnaudvl
Copy link
Contributor Author

arnaudvl commented Sep 7, 2022

Overall LGTM. Ok with proposed departure for the DeepKernel API for keops assuming DeepKernel is never really meant to be used by user directly?

No, its intended usage is always within a learned detector. I will wait then until @ascillitoe leaves comments before making possibly some final changes and merging.

alibi_detect/cd/keops/learned_kernel.py Show resolved Hide resolved
alibi_detect/cd/learned_kernel.py Show resolved Hide resolved
alibi_detect/cd/keops/learned_kernel.py Outdated Show resolved Hide resolved
@ascillitoe
Copy link
Contributor

ascillitoe commented Sep 7, 2022

@arnaudvl the proposed api for the keops DeepKernel API looks sensible to me. Certainly preferable to having to make the actual detector more backend-specific. Plus, as you say, I can't see a good reason for someone to call DeepKernel directly. Assuming we're all happy with the proposed api I will go through and do a final review this afternoon (in case I missed anything...).

@@ -37,7 +41,16 @@ def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
return torch.einsum('ji,ki->jk', self.dense(x), self.dense(y))


tests_lkdrift = ['tensorflow', 'pytorch', 'PyToRcH', 'mxnet']
if has_keops:
class MyKernelKeops(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as this one?

@@ -2,14 +2,22 @@
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"pycharm": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nitpicky: It is preferable to strip out this unnecessary metadata...

Copy link
Contributor

@ascillitoe ascillitoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants