-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use output of every token in a lstm sequence #10771
Comments
@April0402 你看以下demo满足你需求么? def gru_decoder_with_attention(enc_vec, enc_proj, current_word,some_input_for_attention):
decoder_mem = memory(name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
context = simple_attention(some_attr=some_input_for_attention )
with mixed_layer(size=decoder_size * 3) as decoder_inputs:
decoder_inputs += full_matrix_projection(input=context)
decoder_inputs += full_matrix_projection(input=current_word)
gru_step = gru_step_layer(
name='gru_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
with mixed_layer(
size=num_classes, bias_attr=True,
act=SoftmaxActivation()) as out:
out += full_matrix_projection(input=gru_step)
return out some_input_for_attention = ...
group_inputs = [
StaticInput(input=encoded_vector, is_seq=True),
StaticInput(input=encoded_proj, is_seq=True),
trg_embedding,
some_input_for_attention,
]
decoder_out = recurrent_group(
name=decoder_group_name,
step=gru_decoder_with_attention, input=group_inputs)
cost = classification_cost(input=decoder_out, label=label_data) |
另外,建议使用paddle fluid API. 我们以后不会再对paddle v2 API进行更新维护。 |
@wanghaoshuang 我在问题里面说了一下,demo这个和我的需求不太一样,demo的attention是比较典型的翻译模型,attention是基于这个sequence中的其他词(前面的词),但是我这个attention,是基于外部信息去加的。 |
我上边给你贴的demo里的attention并不止用了当前sequence中的其他词,还有一个外部的input |
根据你的需求, |
@pkuyym 我们paddle v2 API 有没有对attention更灵活应用的demo? |
这个demo有完整版的么?或者有step这个函数的说明么?api上的说明还是太少了,没办法得知到底怎么用。 |
@April0402 v2版序列模型大多基于recurrent_group + recurrent_decoder_step实现,可以在recurrent_decoder_step里面实现比较复杂的逻辑,有些略复杂的模型,对这种组合的应用比较深刻,建议看一下: |
@wanghaoshuang 麻烦问一下在simple_attention里面,encoded_sequence和encoded_proj这两个sequence可以是不一样长度的,对吧?? |
是的,可以不相等,可以参考这里的实现:https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/trainer_config_helpers/networks.py#L1400 另外,你也可以实现自己的attention function. |
I want to add attention in LSTM model. According to v2 api, I should use lstmemory_group. But the api is too short and too simple. I cannot understand how to use this layer.
Btw, this attention is not based on information from this sequence. My model is a machine reading comprehension model, which use attention based on similarity between paragraph sequence and query sequence. That's why I cannot use the demo in book 08, whose attention is based on tokens in the same sequence.
The text was updated successfully, but these errors were encountered: