-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add factorized embedding layer and tests #10
Conversation
Hi @colehawkins, it looks great! I think it's worth adding a from_layer -- Tensorly should be able to decompose them even if their quite big (as long as they fit in memory). We can add an option to specify which CP to use (users can already specify which SVD function to use, e.g. randomized). The code for tensorizing the shape is great - i was actually also working on something like this (but for factorized linear layers :) ). I think we should put it in a separate file, e.g. in utils, and have a nice API (e.g. For the API, we could accept a parameters, e.g. Finally, should we write the forward pass ourselves, efficiently, using the factorizations' getitem? |
Thanks for the feedback! The I've had some trouble understanding what the most natural way to use
I get a tensor back (as expected), but I don't know how to write
which is roughly what I want, since all of the second dimension should come back. |
MODES = ['ascending', 'descending', 'mixed'] | ||
CRITERIONS = ['entropy', 'var'] | ||
|
||
def _to_list(p): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these parts can be removed and import from utils now
You can directly index the tensorized tensor as an array: |
Thanks for the pointer on indexing. The challenge that I ran into is that this indexing style only works out of the box for BlockTT since CPTensorized returns a CPTensorized when indexing and TuckerTensorized returns a tensor, not a matrix. Is this the desired behavior? I've added some slightly clunky code to catch those two cases, but hopefully this covers the standard use cases. If the embedding layer code looks good, happy to move on to discussing the path forward for the |
Very similar to the previous except that it uses the new reshaping from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks! Just a few minor comments, otherwise good to merge!
assert tensorized_num_embeddings is None and tensorized_embedding_dim is None | ||
except: | ||
raise ValueError( | ||
"Automated factorization enabled but tensor dimensions provided" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just make and if check:
if auto_reshape and (tensorized_num_embeddings is not None) and ...:
raise ValueError(...)
What do you think?
I find it useful to make more detailed error message, e.g. either use 'auto' or specify tensorized_num_embeddings
and tensorized_embedding_dim
but you cannot use both.
tensorized_embedding_dim) == embedding_dim | ||
except: | ||
raise ValueError( | ||
"Provided tensorization dimensions do not match embedding shape") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, we could tell them exactly what error they are making and how to fix it?
#TT-Rec: Tensor Train Compression for Deep Learning Recommendation Model Embeddings | ||
target_stddev = 1 / np.sqrt(3 * self.num_embeddings) | ||
with torch.no_grad(): | ||
self.weight.normal_(0, target_stddev) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be interesting to see how this compares to the default init from tensorly-torch, e.g. just specifying the std of the reconstruction
@@ -61,7 +61,7 @@ def get_tensorized_shape(in_features, out_features, order=None, min_dim=4, verbo | |||
if len(in_ten) > merge_size: | |||
in_ten = merge_ints(in_ten, size=merge_size) | |||
if len(out_ten) > merge_size: | |||
out_ten = merge_ints(in_ten, size=merge_size) | |||
out_ten = merge_ints(out_ten, size=merge_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, thanks! :)
Looks good to me, merging. Thanks @colehawkins! |
@JeanKossaifi comments welcome!
I didn't include a
from_layer
initialization method because that use case seemed strange to me, and may lead to users factorizing a very large tensor that requires some special treatment (randomized SVD, randomized PARAFAC).I'm happy to add that method for completeness.