Why there is no square root at area_temperature? #1900

jiminbot20 · 2021-11-04T12:57:30Z

tensor2tensor/tensor2tensor/layers/area_attention.py

Line 415 in 5623deb

logits = logits / area_temperature

In typical dot product attention, logit which is the input matrix of softmax supposed to be divided by square rooted temperature like the equation below.

However, in this code, logit is just divided with temperature without a square root. Is it correct or wrong? If it is correct, could you explain why you didn't add square root?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why there is no square root at area_temperature? #1900

Why there is no square root at area_temperature? #1900

jiminbot20 commented Nov 4, 2021

Why there is no square root at area_temperature? #1900

Why there is no square root at area_temperature? #1900

Comments

jiminbot20 commented Nov 4, 2021