Add factorized embedding layer and tests #10

colehawkins · 2021-08-28T17:52:48Z

I didn't include a from_layer initialization method because that use case seemed strange to me, and may lead to users factorizing a very large tensor that requires some special treatment (randomized SVD, randomized PARAFAC).

I'm happy to add that method for completeness.

JeanKossaifi · 2021-08-30T19:43:08Z

Hi @colehawkins, it looks great!

I think it's worth adding a from_layer -- Tensorly should be able to decompose them even if their quite big (as long as they fit in memory). We can add an option to specify which CP to use (users can already specify which SVD function to use, e.g. randomized).

The code for tensorizing the shape is great - i was actually also working on something like this (but for factorized linear layers :) ). I think we should put it in a separate file, e.g. in utils, and have a nice API (e.g. get_tensorized_shape), that can be used for all Block tensor decompositions, what do you think? I had written some of the functions but if sympy provides all we need we can consider adding it as a dependency, though I'm always reluctant to add new dependencies unless it's absolutely necessary.

For the API, we could accept a parameters, e.g. tensorized_shape that could be either (tensorized_num_embeddings, tensorized_embedding_dim) or "auto" to specify everything with a single parameter? For rank, we could pass "same" or 0.5, to respectively keep the number of parameters the same or have half the parameters.

Finally, should we write the forward pass ourselves, efficiently, using the factorizations' getitem?

colehawkins · 2021-09-06T00:25:23Z

Thanks for the feedback!

The from_layer is in, and I've moved over the reshaping code.

I've had some trouble understanding what the most natural way to use __getitem__ is. When I just use

__getitem__(input)

I get a tensor back (as expected), but I don't know how to write

__getitem__([input,:])

which is roughly what I want, since all of the second dimension should come back.

JeanKossaifi · 2021-09-06T14:41:54Z

tltorch/factorized_layers/factorized_embedding.py

+MODES = ['ascending', 'descending', 'mixed']
+CRITERIONS = ['entropy', 'var']
+
+def _to_list(p):


I guess these parts can be removed and import from utils now

JeanKossaifi · 2021-09-06T15:32:50Z

You can directly index the tensorized tensor as an array: fact_tensor[indices, :]
The only caveat is that, by default, this will not work if indices is a nested list.
I guess we normally select the same number of rows R for each sample right?
If so, for N samples, we can have indices be a list of size R*N and then just reshape the output to be (N, R) -- we can compare with a naive version with for loop.

colehawkins · 2021-09-06T21:19:23Z

Thanks for the pointer on indexing.

The challenge that I ran into is that this indexing style only works out of the box for BlockTT since CPTensorized returns a CPTensorized when indexing and TuckerTensorized returns a tensor, not a matrix. Is this the desired behavior?

I've added some slightly clunky code to catch those two cases, but hopefully this covers the standard use cases.

If the embedding layer code looks good, happy to move on to discussing the path forward for the utils.py, sympy vs our factorization, and reshaping.

colehawkins · 2021-10-11T21:34:16Z

Very similar to the previous except that it uses the new reshaping from utils.tensorize_shape. Minor edit added to utils.tensorize_shape to correct a typo that returns the input dimension tensorization twice.

JeanKossaifi

Looks great, thanks! Just a few minor comments, otherwise good to merge!

JeanKossaifi · 2021-10-12T14:49:32Z

tltorch/factorized_layers/factorized_embedding.py

+                assert tensorized_num_embeddings is None and tensorized_embedding_dim is None
+            except:
+                raise ValueError(
+                    "Automated factorization enabled but tensor dimensions provided"


I would just make and if check:

if auto_reshape and (tensorized_num_embeddings is not None) and ...: raise ValueError(...)

What do you think?

I find it useful to make more detailed error message, e.g. either use 'auto' or specify tensorized_num_embeddings and tensorized_embedding_dim but you cannot use both.

JeanKossaifi · 2021-10-12T14:49:47Z

tltorch/factorized_layers/factorized_embedding.py

+                        tensorized_embedding_dim) == embedding_dim
+            except:
+                raise ValueError(
+                    "Provided tensorization dimensions do not match embedding shape")


Same here, we could tell them exactly what error they are making and how to fix it?

JeanKossaifi · 2021-10-12T14:51:35Z

tltorch/factorized_layers/factorized_embedding.py

+        #TT-Rec: Tensor Train Compression for Deep Learning Recommendation Model Embeddings
+        target_stddev = 1 / np.sqrt(3 * self.num_embeddings)
+        with torch.no_grad():
+            self.weight.normal_(0, target_stddev)


Would be interesting to see how this compares to the default init from tensorly-torch, e.g. just specifying the std of the reconstruction

JeanKossaifi · 2021-10-12T14:54:06Z

tltorch/utils/tensorize_shape.py

@@ -61,7 +61,7 @@ def get_tensorized_shape(in_features, out_features, order=None, min_dim=4, verbo
    if len(in_ten) > merge_size:
        in_ten = merge_ints(in_ten, size=merge_size)
    if len(out_ten) > merge_size:
-        out_ten = merge_ints(in_ten, size=merge_size)
+        out_ten = merge_ints(out_ten, size=merge_size)


Good spot, thanks! :)

JeanKossaifi · 2021-10-13T17:06:51Z

Looks good to me, merging. Thanks @colehawkins!

Add factorized embedding layer and tests

df9c9cc

colehawkins and others added 2 commits September 5, 2021 16:27

refactor reshape utilities in utils, and add from_embedding method

33229cb

Merge branch 'tensorly:main' into main

847f0b6

JeanKossaifi reviewed Sep 6, 2021

View reviewed changes

Update for better indexing and add tests for 2-D inputs

abdac06

colehawkins and others added 6 commits October 11, 2021 13:53

Merge branch 'tensorly:main' into main

064f42c

fix minor typo

8cb5229

No longer need reshape utils here

3d0f18e

Remove edge case shape tests, the user can handle these

2dc31dc

Final touch-up

ac543a6

Final touch-up

0d23c4c

JeanKossaifi reviewed Oct 12, 2021

View reviewed changes

colehawkins added 2 commits October 12, 2021 10:17

More informative and cleaner errors. Use tltorch built-ins for init

abb2497

Update for pylint-recommended import

4b71b28

JeanKossaifi merged commit f8d13d4 into tensorly:main Oct 13, 2021

JeanKossaifi mentioned this pull request Oct 13, 2021

Tensorized Embedding Layers #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add factorized embedding layer and tests #10

Add factorized embedding layer and tests #10

colehawkins commented Aug 28, 2021 •

edited

Loading

JeanKossaifi commented Aug 30, 2021 •

edited

Loading

colehawkins commented Sep 6, 2021

JeanKossaifi Sep 6, 2021

JeanKossaifi commented Sep 6, 2021

colehawkins commented Sep 6, 2021

colehawkins commented Oct 11, 2021

JeanKossaifi left a comment

JeanKossaifi Oct 12, 2021

JeanKossaifi Oct 12, 2021

JeanKossaifi Oct 12, 2021

JeanKossaifi Oct 12, 2021

JeanKossaifi commented Oct 13, 2021

Add factorized embedding layer and tests #10

Add factorized embedding layer and tests #10

Conversation

colehawkins commented Aug 28, 2021 • edited Loading

JeanKossaifi commented Aug 30, 2021 • edited Loading

colehawkins commented Sep 6, 2021

JeanKossaifi Sep 6, 2021

Choose a reason for hiding this comment

JeanKossaifi commented Sep 6, 2021

colehawkins commented Sep 6, 2021

colehawkins commented Oct 11, 2021

JeanKossaifi left a comment

Choose a reason for hiding this comment

JeanKossaifi Oct 12, 2021

Choose a reason for hiding this comment

JeanKossaifi Oct 12, 2021

Choose a reason for hiding this comment

JeanKossaifi Oct 12, 2021

Choose a reason for hiding this comment

JeanKossaifi Oct 12, 2021

Choose a reason for hiding this comment

JeanKossaifi commented Oct 13, 2021

colehawkins commented Aug 28, 2021 •

edited

Loading

JeanKossaifi commented Aug 30, 2021 •

edited

Loading