@@ -125,9 +125,9 @@ Decoder
125
125
126
126
Our Decoder will predict the next word, conditioned on the Encoder's final
127
127
hidden state and an embedded representation of the previous target word -- which
128
- is sometimes called *input feeding * or * teacher forcing *. More specifically,
129
- we'll use a :class: `torch.nn.LSTM ` to produce a sequence of hidden states that
130
- we'll project to the size of the output vocabulary to predict each target word.
128
+ is sometimes called *teacher forcing *. More specifically, we'll use a
129
+ :class: `torch.nn.LSTM ` to produce a sequence of hidden states that we'll project
130
+ to the size of the output vocabulary to predict each target word.
131
131
132
132
::
133
133
@@ -171,7 +171,7 @@ we'll project to the size of the output vocabulary to predict each target word.
171
171
"""
172
172
Args:
173
173
prev_output_tokens (LongTensor): previous decoder outputs of shape
174
- `(batch, tgt_len)`, for input feeding/ teacher forcing
174
+ `(batch, tgt_len)`, for teacher forcing
175
175
encoder_out (Tensor, optional): output from the encoder, used for
176
176
encoder-side attention
177
177
@@ -387,8 +387,8 @@ previous hidden states.
387
387
388
388
In fairseq this is called :ref: `Incremental decoding `. Incremental decoding is a
389
389
special mode at inference time where the Model only receives a single timestep
390
- of input corresponding to the immediately previous output token (for input
391
- feeding ) and must produce the next output incrementally. Thus the model must
390
+ of input corresponding to the immediately previous output token (for teacher
391
+ forcing ) and must produce the next output incrementally. Thus the model must
392
392
cache any long-term state that is needed about the sequence, e.g., hidden
393
393
states, convolutional states, etc.
394
394
0 commit comments