Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use output of every token in a lstm sequence #10771

Closed
April0402 opened this issue May 18, 2018 · 10 comments
Closed

How to use output of every token in a lstm sequence #10771

April0402 opened this issue May 18, 2018 · 10 comments
Labels
User 用于标记用户问题

Comments

@April0402
Copy link

I want to add attention in LSTM model. According to v2 api, I should use lstmemory_group. But the api is too short and too simple. I cannot understand how to use this layer.
Btw, this attention is not based on information from this sequence. My model is a machine reading comprehension model, which use attention based on similarity between paragraph sequence and query sequence. That's why I cannot use the demo in book 08, whose attention is based on tokens in the same sequence.

@wanghaoshuang
Copy link
Contributor

wanghaoshuang commented May 18, 2018

@April0402 你看以下demo满足你需求么?

def gru_decoder_with_attention(enc_vec, enc_proj, current_wordsome_input_for_attention):
   decoder_mem = memory(name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
   context = simple_attention(some_attr=some_input_for_attention )

   with mixed_layer(size=decoder_size * 3) as decoder_inputs:
     decoder_inputs += full_matrix_projection(input=context)
     decoder_inputs += full_matrix_projection(input=current_word)

   gru_step = gru_step_layer(
     name='gru_decoder',
     input=decoder_inputs,
     output_mem=decoder_mem,
     size=decoder_size)

   with mixed_layer(
     size=num_classes, bias_attr=True,
     act=SoftmaxActivation()) as out:
     out += full_matrix_projection(input=gru_step)

   return out
some_input_for_attention = ...

group_inputs = [
       StaticInput(input=encoded_vector, is_seq=True),
       StaticInput(input=encoded_proj, is_seq=True),
       trg_embedding,
       some_input_for_attention,
 ]

decoder_out = recurrent_group(
     name=decoder_group_name,
     step=gru_decoder_with_attention, input=group_inputs)

cost = classification_cost(input=decoder_out, label=label_data)

@wanghaoshuang
Copy link
Contributor

另外,建议使用paddle fluid API. 我们以后不会再对paddle v2 API进行更新维护。

@wanghaoshuang wanghaoshuang added the User 用于标记用户问题 label May 18, 2018
@April0402
Copy link
Author

@wanghaoshuang 我在问题里面说了一下,demo这个和我的需求不太一样,demo的attention是比较典型的翻译模型,attention是基于这个sequence中的其他词(前面的词),但是我这个attention,是基于外部信息去加的。
而且这个step,到底输入的参数都是什么,这个在api上也没有说明。我看了这个demo之后还是不知道应该怎么写。
关于迁移fluid的问题,这个我之前看过一次,迁移成本好像还是挺大的。目前我们项目比较紧急可能没有排期来迁移,还是麻烦支持一下在v2上怎么能正确的达成想要的attention。后续等我们有了排期再做迁移,非常感谢

@wanghaoshuang
Copy link
Contributor

wanghaoshuang commented May 23, 2018

attention是基于这个sequence中的其他词

我上边给你贴的demo里的attention并不止用了当前sequence中的其他词,还有一个外部的input some_input_for_attention, 这个input可以很灵活的设置的。

@wanghaoshuang
Copy link
Contributor

根据你的需求, some_input_for_attention可以是把similarity of paragraph sequence and query expand 成的一个 sequence.

@wanghaoshuang
Copy link
Contributor

@pkuyym 我们paddle v2 API 有没有对attention更灵活应用的demo?

@April0402
Copy link
Author

这个demo有完整版的么?或者有step这个函数的说明么?api上的说明还是太少了,没办法得知到底怎么用。
recurrent_group(name=decoder_group_name,step=gru_decoder_with_attention, input=group_inputs) 这个里面input的内容,是step的函数的参数,是吧?
还有就是memory和simple_attention,这两个layer的用法都和api里对不上啊

@pkuyym
Copy link
Contributor

pkuyym commented May 23, 2018

@April0402 v2版序列模型大多基于recurrent_group + recurrent_decoder_step实现,可以在recurrent_decoder_step里面实现比较复杂的逻辑,有些略复杂的模型,对这种组合的应用比较深刻,建议看一下:

  1. https://github.com/PaddlePaddle/models/tree/develop/mt_with_external_memory
  2. Add demo for ntm_addressing_mechanism models#56

@April0402
Copy link
Author

@wanghaoshuang 麻烦问一下在simple_attention里面,encoded_sequence和encoded_proj这两个sequence可以是不一样长度的,对吧??

@wanghaoshuang
Copy link
Contributor

wanghaoshuang commented May 23, 2018

@April0402

麻烦问一下在simple_attention里面,encoded_sequence和encoded_proj这两个sequence可以是不一样长度的,对吧??

是的,可以不相等,可以参考这里的实现:https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/trainer_config_helpers/networks.py#L1400

另外,你也可以实现自己的attention function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User 用于标记用户问题
Projects
None yet
Development

No branches or pull requests

4 participants