How to use output of every token in a lstm sequence #10771

April0402 · 2018-05-18T07:02:38Z

I want to add attention in LSTM model. According to v2 api, I should use lstmemory_group. But the api is too short and too simple. I cannot understand how to use this layer.
Btw, this attention is not based on information from this sequence. My model is a machine reading comprehension model, which use attention based on similarity between paragraph sequence and query sequence. That's why I cannot use the demo in book 08, whose attention is based on tokens in the same sequence.

wanghaoshuang · 2018-05-18T07:17:25Z

@April0402 你看以下demo满足你需求么？

def gru_decoder_with_attention(enc_vec, enc_proj, current_word，some_input_for_attention):
   decoder_mem = memory(name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
   context = simple_attention(some_attr=some_input_for_attention )

   with mixed_layer(size=decoder_size * 3) as decoder_inputs:
     decoder_inputs += full_matrix_projection(input=context)
     decoder_inputs += full_matrix_projection(input=current_word)

   gru_step = gru_step_layer(
     name='gru_decoder',
     input=decoder_inputs,
     output_mem=decoder_mem,
     size=decoder_size)

   with mixed_layer(
     size=num_classes, bias_attr=True,
     act=SoftmaxActivation()) as out:
     out += full_matrix_projection(input=gru_step)

   return out

some_input_for_attention = ...

group_inputs = [
       StaticInput(input=encoded_vector, is_seq=True),
       StaticInput(input=encoded_proj, is_seq=True),
       trg_embedding,
       some_input_for_attention,
 ]

decoder_out = recurrent_group(
     name=decoder_group_name,
     step=gru_decoder_with_attention, input=group_inputs)

cost = classification_cost(input=decoder_out, label=label_data)

wanghaoshuang · 2018-05-18T07:21:08Z

另外，建议使用paddle fluid API. 我们以后不会再对paddle v2 API进行更新维护。

April0402 · 2018-05-18T08:22:28Z

@wanghaoshuang 我在问题里面说了一下，demo这个和我的需求不太一样，demo的attention是比较典型的翻译模型，attention是基于这个sequence中的其他词（前面的词），但是我这个attention，是基于外部信息去加的。
而且这个step，到底输入的参数都是什么，这个在api上也没有说明。我看了这个demo之后还是不知道应该怎么写。
关于迁移fluid的问题，这个我之前看过一次，迁移成本好像还是挺大的。目前我们项目比较紧急可能没有排期来迁移，还是麻烦支持一下在v2上怎么能正确的达成想要的attention。后续等我们有了排期再做迁移，非常感谢

wanghaoshuang · 2018-05-23T06:03:56Z

attention是基于这个sequence中的其他词

我上边给你贴的demo里的attention并不止用了当前sequence中的其他词，还有一个外部的input some_input_for_attention, 这个input可以很灵活的设置的。

wanghaoshuang · 2018-05-23T06:06:06Z

根据你的需求， some_input_for_attention可以是把similarity of paragraph sequence and query expand 成的一个 sequence.

wanghaoshuang · 2018-05-23T06:07:07Z

@pkuyym 我们paddle v2 API 有没有对attention更灵活应用的demo?

April0402 · 2018-05-23T06:11:42Z

这个demo有完整版的么？或者有step这个函数的说明么？api上的说明还是太少了，没办法得知到底怎么用。
recurrent_group(name=decoder_group_name,step=gru_decoder_with_attention, input=group_inputs) 这个里面input的内容，是step的函数的参数，是吧？
还有就是memory和simple_attention，这两个layer的用法都和api里对不上啊

pkuyym · 2018-05-23T06:17:26Z

@April0402 v2版序列模型大多基于recurrent_group + recurrent_decoder_step实现，可以在recurrent_decoder_step里面实现比较复杂的逻辑，有些略复杂的模型，对这种组合的应用比较深刻，建议看一下：

April0402 · 2018-05-23T08:02:02Z

@wanghaoshuang 麻烦问一下在simple_attention里面，encoded_sequence和encoded_proj这两个sequence可以是不一样长度的，对吧？？

wanghaoshuang · 2018-05-23T08:29:11Z

@April0402

麻烦问一下在simple_attention里面，encoded_sequence和encoded_proj这两个sequence可以是不一样长度的，对吧？？

是的，可以不相等，可以参考这里的实现：https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/trainer_config_helpers/networks.py#L1400

另外，你也可以实现自己的attention function.

wanghaoshuang added the User 用于标记用户问题 label May 18, 2018

gongweibao closed this as completed Jun 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use output of every token in a lstm sequence #10771

How to use output of every token in a lstm sequence #10771

April0402 commented May 18, 2018

wanghaoshuang commented May 18, 2018 •

edited by Superjomn

Loading

wanghaoshuang commented May 18, 2018

April0402 commented May 18, 2018

wanghaoshuang commented May 23, 2018 •

edited

Loading

wanghaoshuang commented May 23, 2018

wanghaoshuang commented May 23, 2018

April0402 commented May 23, 2018

pkuyym commented May 23, 2018 •

edited

Loading

April0402 commented May 23, 2018

wanghaoshuang commented May 23, 2018 •

edited

Loading

How to use output of every token in a lstm sequence #10771

How to use output of every token in a lstm sequence #10771

Comments

April0402 commented May 18, 2018

wanghaoshuang commented May 18, 2018 • edited by Superjomn Loading

wanghaoshuang commented May 18, 2018

April0402 commented May 18, 2018

wanghaoshuang commented May 23, 2018 • edited Loading

wanghaoshuang commented May 23, 2018

wanghaoshuang commented May 23, 2018

April0402 commented May 23, 2018

pkuyym commented May 23, 2018 • edited Loading

April0402 commented May 23, 2018

wanghaoshuang commented May 23, 2018 • edited Loading

wanghaoshuang commented May 18, 2018 •

edited by Superjomn

Loading

wanghaoshuang commented May 23, 2018 •

edited

Loading

pkuyym commented May 23, 2018 •

edited

Loading

wanghaoshuang commented May 23, 2018 •

edited

Loading