Using the encoder for a downstream task #28

IbrahimSobh · 2020-12-29T22:13:38Z

Thank you very much for your excellent work!

In the file "translate.py" line 128:

enc1 = self.encoder('fwd', x=x1, lengths=len1,
                                langs=langs1, causal=False)

I noticed that when x1 shape is (n, 1) then the enc1 shape is (1, n, 1024), where n is the number of input tokens (len1).

My question is about enc1, does enc1 represent the sequence of hidden-states at the output of the last layer of the encoder model?

For example, can I use the encoder output enc1 as an input to a bidirectional LSTM network to perform some kind of source code classification? or there is a better way?

Moreover, the decoder takes the enc1 as input along with len1, and target language as follows

self.decoder.generate(enc1, len1, lang2_id, ...

Accordingly, I assume that the decoder network maintains its Q,K,V weights to learn how to attends to enc1 of shape (1, n, 1024) that represents the input sequence of length n, and in this case, the enc1 vectors represent the values V, right?

Best Regards

The text was updated successfully, but these errors were encountered:

baptisteroziere · 2021-04-06T14:14:20Z

That's right, enc1 is the output of the last layer of the encoder model and you could use it (or its average or just the first token) for downstream tasks. You're also right when you say that we use cross attention in the decoder (in that case, enc1 is used for the values and keys).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the encoder for a downstream task #28

Using the encoder for a downstream task #28

IbrahimSobh commented Dec 29, 2020 •

edited

Loading

baptisteroziere commented Apr 6, 2021

Using the encoder for a downstream task #28

Using the encoder for a downstream task #28

Comments

IbrahimSobh commented Dec 29, 2020 • edited Loading

baptisteroziere commented Apr 6, 2021

IbrahimSobh commented Dec 29, 2020 •

edited

Loading