On DeepSpeed Transformer Kernel Attention Mask #2109

hezq06 · 2022-07-19T07:36:48Z

hezq06
Jul 19, 2022

Hi, I'm trying to use DeepSpeed transformer kernel to build a GPT-like system by attention masking. After checking the source code, there is an attention_mask parameter that seems to be able to play with. However, there is no documentation about this part. The bing_bert example uses it only to do input masking. Is it able to do attention masking with the attention_mask parameter?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On DeepSpeed Transformer Kernel Attention Mask #2109

{{title}}

Replies: 0 comments

Select a reply

On DeepSpeed Transformer Kernel Attention Mask #2109

hezq06 Jul 19, 2022

Replies: 0 comments

hezq06
Jul 19, 2022