You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.
I noticed that when x1 shape is (n, 1) then the enc1 shape is (1, n, 1024), where n is the number of input tokens (len1).
My question is about enc1, does enc1 represent the sequence of hidden-states at the output of the last layer of the encoder model?
For example, can I use the encoder output enc1 as an input to a bidirectional LSTM network to perform some kind of source code classification? or there is a better way?
Moreover, the decoder takes the enc1 as input along with len1, and target language as follows
self.decoder.generate(enc1, len1, lang2_id, ...
Accordingly, I assume that the decoder network maintains its Q,K,V weights to learn how to attends to enc1 of shape (1, n, 1024) that represents the input sequence of length n, and in this case, the enc1 vectors represent the values V, right?
Best Regards
The text was updated successfully, but these errors were encountered:
That's right, enc1 is the output of the last layer of the encoder model and you could use it (or its average or just the first token) for downstream tasks. You're also right when you say that we use cross attention in the decoder (in that case, enc1 is used for the values and keys).
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Thank you very much for your excellent work!
In the file "translate.py" line 128:
I noticed that when x1 shape is (n, 1) then the enc1 shape is (1, n, 1024), where n is the number of input tokens (len1).
My question is about enc1, does enc1 represent the sequence of hidden-states at the output of the last layer of the encoder model?
For example, can I use the encoder output enc1 as an input to a bidirectional LSTM network to perform some kind of source code classification? or there is a better way?
Moreover, the decoder takes the enc1 as input along with len1, and target language as follows
self.decoder.generate(enc1, len1, lang2_id, ...
Accordingly, I assume that the decoder network maintains its Q,K,V weights to learn how to attends to enc1 of shape (1, n, 1024) that represents the input sequence of length n, and in this case, the enc1 vectors represent the values V, right?
Best Regards
The text was updated successfully, but these errors were encountered: