Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add factorized embedding layer and tests #10

Merged
merged 12 commits into from
Oct 13, 2021
Merged

Add factorized embedding layer and tests #10

merged 12 commits into from
Oct 13, 2021

Conversation

colehawkins
Copy link
Contributor

@colehawkins colehawkins commented Aug 28, 2021

@JeanKossaifi comments welcome!

I didn't include a from_layer initialization method because that use case seemed strange to me, and may lead to users factorizing a very large tensor that requires some special treatment (randomized SVD, randomized PARAFAC).

I'm happy to add that method for completeness.

@JeanKossaifi
Copy link
Member

JeanKossaifi commented Aug 30, 2021

Hi @colehawkins, it looks great!

I think it's worth adding a from_layer -- Tensorly should be able to decompose them even if their quite big (as long as they fit in memory). We can add an option to specify which CP to use (users can already specify which SVD function to use, e.g. randomized).

The code for tensorizing the shape is great - i was actually also working on something like this (but for factorized linear layers :) ). I think we should put it in a separate file, e.g. in utils, and have a nice API (e.g. get_tensorized_shape), that can be used for all Block tensor decompositions, what do you think? I had written some of the functions but if sympy provides all we need we can consider adding it as a dependency, though I'm always reluctant to add new dependencies unless it's absolutely necessary.

For the API, we could accept a parameters, e.g. tensorized_shape that could be either (tensorized_num_embeddings, tensorized_embedding_dim) or "auto" to specify everything with a single parameter? For rank, we could pass "same" or 0.5, to respectively keep the number of parameters the same or have half the parameters.

Finally, should we write the forward pass ourselves, efficiently, using the factorizations' getitem?

@colehawkins
Copy link
Contributor Author

Thanks for the feedback!

The from_layer is in, and I've moved over the reshaping code.

I've had some trouble understanding what the most natural way to use __getitem__ is. When I just use

__getitem__(input)

I get a tensor back (as expected), but I don't know how to write

__getitem__([input,:])

which is roughly what I want, since all of the second dimension should come back.

MODES = ['ascending', 'descending', 'mixed']
CRITERIONS = ['entropy', 'var']

def _to_list(p):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these parts can be removed and import from utils now

@JeanKossaifi
Copy link
Member

You can directly index the tensorized tensor as an array: fact_tensor[indices, :]
The only caveat is that, by default, this will not work if indices is a nested list.
I guess we normally select the same number of rows R for each sample right?
If so, for N samples, we can have indices be a list of size R*N and then just reshape the output to be (N, R) -- we can compare with a naive version with for loop.

@colehawkins
Copy link
Contributor Author

Thanks for the pointer on indexing.

The challenge that I ran into is that this indexing style only works out of the box for BlockTT since CPTensorized returns a CPTensorized when indexing and TuckerTensorized returns a tensor, not a matrix. Is this the desired behavior?

I've added some slightly clunky code to catch those two cases, but hopefully this covers the standard use cases.

If the embedding layer code looks good, happy to move on to discussing the path forward for the utils.py, sympy vs our factorization, and reshaping.

@colehawkins
Copy link
Contributor Author

Very similar to the previous except that it uses the new reshaping from utils.tensorize_shape. Minor edit added to utils.tensorize_shape to correct a typo that returns the input dimension tensorization twice.

Copy link
Member

@JeanKossaifi JeanKossaifi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! Just a few minor comments, otherwise good to merge!

assert tensorized_num_embeddings is None and tensorized_embedding_dim is None
except:
raise ValueError(
"Automated factorization enabled but tensor dimensions provided"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just make and if check:

if auto_reshape and (tensorized_num_embeddings is not None) and ...:
    raise ValueError(...)

What do you think?

I find it useful to make more detailed error message, e.g. either use 'auto' or specify tensorized_num_embeddings and tensorized_embedding_dim but you cannot use both.

tensorized_embedding_dim) == embedding_dim
except:
raise ValueError(
"Provided tensorization dimensions do not match embedding shape")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we could tell them exactly what error they are making and how to fix it?

#TT-Rec: Tensor Train Compression for Deep Learning Recommendation Model Embeddings
target_stddev = 1 / np.sqrt(3 * self.num_embeddings)
with torch.no_grad():
self.weight.normal_(0, target_stddev)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be interesting to see how this compares to the default init from tensorly-torch, e.g. just specifying the std of the reconstruction

@@ -61,7 +61,7 @@ def get_tensorized_shape(in_features, out_features, order=None, min_dim=4, verbo
if len(in_ten) > merge_size:
in_ten = merge_ints(in_ten, size=merge_size)
if len(out_ten) > merge_size:
out_ten = merge_ints(in_ten, size=merge_size)
out_ten = merge_ints(out_ten, size=merge_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, thanks! :)

@JeanKossaifi
Copy link
Member

Looks good to me, merging. Thanks @colehawkins!

@JeanKossaifi JeanKossaifi merged commit f8d13d4 into tensorly:main Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants