Skip to content
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.

Using the encoder for a downstream task #28

Open
IbrahimSobh opened this issue Dec 29, 2020 · 1 comment
Open

Using the encoder for a downstream task #28

IbrahimSobh opened this issue Dec 29, 2020 · 1 comment

Comments

@IbrahimSobh
Copy link

IbrahimSobh commented Dec 29, 2020

Thank you very much for your excellent work!

In the file "translate.py" line 128:

enc1 = self.encoder('fwd', x=x1, lengths=len1,
                                langs=langs1, causal=False) 

I noticed that when x1 shape is (n, 1) then the enc1 shape is (1, n, 1024), where n is the number of input tokens (len1).

My question is about enc1, does enc1 represent the sequence of hidden-states at the output of the last layer of the encoder model?

For example, can I use the encoder output enc1 as an input to a bidirectional LSTM network to perform some kind of source code classification? or there is a better way?

Moreover, the decoder takes the enc1 as input along with len1, and target language as follows

self.decoder.generate(enc1, len1, lang2_id, ...

Accordingly, I assume that the decoder network maintains its Q,K,V weights to learn how to attends to enc1 of shape (1, n, 1024) that represents the input sequence of length n, and in this case, the enc1 vectors represent the values V, right?

Best Regards

@baptisteroziere
Copy link
Contributor

That's right, enc1 is the output of the last layer of the encoder model and you could use it (or its average or just the first token) for downstream tasks. You're also right when you say that we use cross attention in the decoder (in that case, enc1 is used for the values and keys).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants