Learned kernel MMD with KeOps backend #602

arnaudvl · 2022-08-26T13:30:34Z

Following #548 , also extending the learned kernel MMD detector with the KeOps backend to further scale and speed up the detector.

Add tests.
Check if additional test for LearnedKernelDrift is required given decreased coverage.
Fix typing.
Add docs, clarify that to optimally benefit from the speed ups, an initial bandwidth for kernel_a and optionally kernel_b needs to be set.
Extend MMD benchmarking example.
Agree on API of utils.keops.kernels.DeepKernel. The main issue here is that DeepKernel.proj is not used within the DeepKernel's forward pass, but explicitly called by the learned kernel drift detector. The reason is that we first need to do regular torch tensor computations using the projection, and then with those projected features compute the kernel matrix. KeOps is only used for the latter, not when computing the projection itself. So technically you could pass a separate projection (i.e. DeepKernel.proj) to the drift detector and apply a weighted sum kernel later on the original and projected data. But this would break the consistency of the API's/input kwargs with the other backends and realistically make the detector harder to understand. As a result, I chose to keep the same DeepKernel format as the PyTorch and TensorFlow backends and deal with this difference directly in the drift detector. This also means there are explicit checks in place (self.has_proj and self.has_kernel_b) to check if the DeepKernel format is used and we can do the projection separately.
Would like to add num_workers for both KeOps and PyTorch backends since it can make a significant difference for the dataloader speed for higher number of instances. Add num_workers to PyTorch/KeOps backend where relevant #611
There is an additional batch_size_predict kwarg which I would also like to add to the PyTorch backend. The reason is that the optimal batch size for training can be wildly different than that for prediction (where we just care about being as fast as possible within our compute budget). So if we e.g. pick batch_size=32 for training we might want to change this to batch_size_predict=1000 for prediction using the trained kernel. There is also another reason why they can be very different: during training of the detector the whole training batch (all tensors incl. the projection etc) needs to fit on the GPU. But for predictions across all permutations at once we can first compute all the projections separately, and then lazily evaluate both the projections and original instances for all permutations. This means we can likely get away with much higher batch sizes for the projection predictions. Add batch_size_predict as kwarg to PyTorch backend for learned detectors. #612
Follow up PR separating out the PyTorch and KeOps trainers. Merge PyTorch and KeOps trainers for the learned kernel drift detector. #613

The smaller potential PyTorch changes (num_workers and batch_size_predict) can be done in a quick follow up PR.

review-notebook-app · 2022-08-30T12:37:42Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

alibi_detect/utils/keops/kernels.py

codecov-commenter · 2022-09-02T17:22:22Z

Codecov Report

Merging #602 (36cb4aa) into master (0bbe586) will decrease coverage by 1.43%.
The diff coverage is 20.84%.

@@            Coverage Diff             @@
##           master     #602      +/-   ##
==========================================
- Coverage   83.58%   82.14%   -1.44%     
==========================================
  Files         207      209       +2     
  Lines       13838    14159     +321     
==========================================
+ Hits        11566    11631      +65     
- Misses       2272     2528     +256

Impacted Files	Coverage Δ
alibi_detect/cd/keops/learned_kernel.py	`0.00% <0.00%> (ø)`
alibi_detect/tests/test_dep_management.py	`18.36% <ø> (ø)`
alibi_detect/utils/keops/__init__.py	`0.00% <0.00%> (ø)`
alibi_detect/utils/keops/kernels.py	`0.00% <0.00%> (ø)`
...ibi_detect/utils/keops/tests/test_kernels_keops.py	`32.32% <31.11%> (+0.18%)`	⬆️
...detect/cd/keops/tests/test_learned_kernel_keops.py	`36.36% <36.36%> (ø)`
alibi_detect/cd/tests/test_learned_kernel.py	`83.82% <50.00%> (-12.26%)`	⬇️
alibi_detect/cd/learned_kernel.py	`95.65% <83.33%> (-4.35%)`	⬇️
alibi_detect/saving/schemas.py	`99.25% <100.00%> (+<0.01%)`	⬆️
alibi_detect/cd/tabular.py	`97.72% <0.00%> (+0.10%)`	⬆️

alibi_detect/utils/keops/kernels.py

alibi_detect/cd/keops/learned_kernel.py

jklaise · 2022-09-07T10:44:08Z

alibi_detect/utils/keops/tests/test_kernels_keops.py

+n_features = [5]
+n_instances = [(100, 100), (100, 75)]
+kernel_a = ['GaussianRBF', 'MyKernel']
+kernel_b = ['GaussianRBF', 'MyKernel', None]
+eps = [0.5, 'trainable']
+tests_dk = list(product(n_features, n_instances, kernel_a, kernel_b, eps))
+n_tests_dk = len(tests_dk)
+
+
+@pytest.fixture
+def deep_kernel_params(request):
+    return tests_dk[request.param]
+
+
+@pytest.mark.skipif(not has_keops, reason='Skipping since pykeops is not installed.')
+@pytest.mark.parametrize('deep_kernel_params', list(range(n_tests_dk)), indirect=True)
+def test_deep_kernel(deep_kernel_params):


Not a huge fan of using this pattern to parametrizing tests, is there a reason not to parametrize each parameter directly? @ascillitoe

Yeh the more conventional way to do it would be to parametrize each parameter separately. e.g. see test_save_model in test_saving.py:

@parametrize('model', [encoder_model]) @parametrize('layer', [None, -1]) def test_save_model(data, model, layer, backend, tmp_path):

This approach has the advantage of giving much more descriptive test names which is useful when things go wrong.

We keep writing tests with the current pattern for consistency with existing tests. But, unless we are going to refactor existing tests very soon maybe we should prioritise adopting/exploring a new pattern so that we have less refactoring to do later...

Ok I was not aware here tbh and followed existing patterns.

jklaise · 2022-09-07T10:46:04Z

alibi_detect/utils/keops/tests/test_kernels_keops.py

+if has_keops:
+    class MyKernel(nn.Module):
+        def __init__(self):
+            super().__init__()
+
+        def forward(self, x: LazyTensor, y: LazyTensor) -> LazyTensor:
+            return (- ((x - y) ** 2).sum(-1)).exp()


Is typing the only reason why this needs to be inside if has_keops block? If so we could just use forward-references to 'LazyTensor'? @ascillitoe

Good point. Looks like that might be the case...

Did the foward-ref not work?

@jklaise and I were chatting earlier and decided (correct me if I'm wrong) that it's not necessarily a better solution, just a bit different, so I kept it as is.

OK fair enough!

(I don't have a strong opinion either way)

alibi_detect/cd/keops/tests/test_learned_kernel_keops.py

jklaise

Overall LGTM. Ok with proposed departure for the DeepKernel API for keops assuming DeepKernel is never really meant to be used by user directly?

arnaudvl · 2022-09-07T10:54:50Z

Overall LGTM. Ok with proposed departure for the DeepKernel API for keops assuming DeepKernel is never really meant to be used by user directly?

No, its intended usage is always within a learned detector. I will wait then until @ascillitoe leaves comments before making possibly some final changes and merging.

alibi_detect/cd/keops/learned_kernel.py

alibi_detect/cd/learned_kernel.py

alibi_detect/cd/keops/learned_kernel.py

ascillitoe · 2022-09-07T11:02:39Z

@arnaudvl the proposed api for the keops DeepKernel API looks sensible to me. Certainly preferable to having to make the actual detector more backend-specific. Plus, as you say, I can't see a good reason for someone to call DeepKernel directly. Assuming we're all happy with the proposed api I will go through and do a final review this afternoon (in case I missed anything...).

ascillitoe · 2022-09-07T11:39:05Z

alibi_detect/cd/tests/test_learned_kernel.py

@@ -37,7 +41,16 @@ def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
        return torch.einsum('ji,ki->jk', self.dense(x), self.dense(y))


-tests_lkdrift = ['tensorflow', 'pytorch', 'PyToRcH', 'mxnet']
+if has_keops:
+    class MyKernelKeops(nn.Module):


Same comment as this one?

ascillitoe · 2022-09-08T15:40:57Z

doc/source/cd/methods/learnedkerneldrift.ipynb

@@ -2,14 +2,22 @@
 "cells": [
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "pycharm": {


Super nitpicky: It is preferable to strip out this unnecessary metadata...

ascillitoe

LGTM!

first commit adding learned kernel mmd with keops backend

072a1d7

arnaudvl added WIP PR is a Work in Progress DO NOT MERGE Do not merge this PR labels Aug 26, 2022

arnaudvl requested review from jklaise and ascillitoe August 26, 2022 13:30

update method docs learned kernel

31e9d04

update preprocessing and types

f11bf52

jklaise reviewed Aug 31, 2022

View reviewed changes

alibi_detect/utils/keops/kernels.py Show resolved Hide resolved

arnaudvl added 8 commits August 31, 2022 23:44

add test and update output type deep kernel

352f666

update example

d006137

test equivalence learned kernel mmd2 keops with torch implementation

841ea66

add deep kernel keops test and add skipif for all keops tests

a2cb8ac

remove print statement

cdce757

fix flake8

ee8938d

handle optional keops dependency

b70c21a

clarify bandwidth setting

869e01f

arnaudvl removed WIP PR is a Work in Progress DO NOT MERGE Do not merge this PR labels Sep 1, 2022

arnaudvl added 2 commits September 1, 2022 18:00

handle keops optional dependency in test

6926cfb

update pydantic model schema

a4ba39e

arnaudvl added 3 commits September 2, 2022 18:56

add DeepKernel to keops dependency management test

baf5f18

add keops to top level learned kernel test

d7c4e4d

update test learned kernel

1f6d5e0

jklaise reviewed Sep 5, 2022

View reviewed changes

alibi_detect/utils/keops/kernels.py Show resolved Hide resolved

jklaise reviewed Sep 5, 2022

View reviewed changes

alibi_detect/cd/keops/learned_kernel.py Show resolved Hide resolved

jklaise reviewed Sep 5, 2022

View reviewed changes

alibi_detect/cd/keops/learned_kernel.py Show resolved Hide resolved

jklaise reviewed Sep 5, 2022

View reviewed changes

alibi_detect/cd/keops/learned_kernel.py Show resolved Hide resolved

jklaise reviewed Sep 7, 2022

View reviewed changes

alibi_detect/cd/keops/tests/test_learned_kernel_keops.py Outdated Show resolved Hide resolved

jklaise suggested changes Sep 7, 2022

View reviewed changes

ascillitoe reviewed Sep 7, 2022

View reviewed changes

alibi_detect/cd/keops/learned_kernel.py Show resolved Hide resolved

alibi_detect/cd/learned_kernel.py Show resolved Hide resolved

alibi_detect/cd/keops/learned_kernel.py Outdated Show resolved Hide resolved

ascillitoe reviewed Sep 7, 2022

View reviewed changes

clarify test variable and make proj type explicit

d64cb29

ascillitoe reviewed Sep 8, 2022

View reviewed changes

ascillitoe approved these changes Sep 8, 2022

View reviewed changes

remove unnecessary metadata

36cb4aa

jklaise approved these changes Sep 8, 2022

View reviewed changes

arnaudvl merged commit 2711580 into SeldonIO:master Sep 8, 2022

This was referenced Jan 12, 2023

Added num_workers kwarg to PyTorch LearnedKernelDrift #714

Merged

Added batch_size_predict kwarg to PyTorch LearnedKernelDrift #715

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learned kernel MMD with KeOps backend #602

Learned kernel MMD with KeOps backend #602

arnaudvl commented Aug 26, 2022 •

edited

Loading

review-notebook-app bot commented Aug 30, 2022

codecov-commenter commented Sep 2, 2022 •

edited

Loading

jklaise Sep 7, 2022

ascillitoe Sep 7, 2022 •

edited

Loading

arnaudvl Sep 7, 2022

arnaudvl Sep 8, 2022

jklaise Sep 7, 2022

ascillitoe Sep 7, 2022

ascillitoe Sep 8, 2022

arnaudvl Sep 8, 2022

ascillitoe Sep 8, 2022 •

edited

Loading

jklaise left a comment

arnaudvl commented Sep 7, 2022

ascillitoe commented Sep 7, 2022 •

edited

Loading

ascillitoe Sep 7, 2022

ascillitoe Sep 8, 2022

ascillitoe left a comment

Learned kernel MMD with KeOps backend #602

Learned kernel MMD with KeOps backend #602

Conversation

arnaudvl commented Aug 26, 2022 • edited Loading

review-notebook-app bot commented Aug 30, 2022

codecov-commenter commented Sep 2, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

ascillitoe Sep 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ascillitoe Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

jklaise left a comment

Choose a reason for hiding this comment

arnaudvl commented Sep 7, 2022

ascillitoe commented Sep 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ascillitoe left a comment

Choose a reason for hiding this comment

arnaudvl commented Aug 26, 2022 •

edited

Loading

codecov-commenter commented Sep 2, 2022 •

edited

Loading

ascillitoe Sep 7, 2022 •

edited

Loading

ascillitoe Sep 8, 2022 •

edited

Loading

ascillitoe commented Sep 7, 2022 •

edited

Loading