k_d dimension #3

xpivan · 2023-02-28T05:14:48Z

xpivan
Feb 28, 2023

Hi,

Thanks for providing notebook aside of your books - Just bought it a few days ago and loving it.

One question from the Multi_Head_attention notebook on CH02:

print("Step 4: Scaled Attention Scores")
k_d=1 #square root of k_d=3 rounded down to 1 for this example
attention_scores = (Q @ K.transpose())/k_d
print(attention_scores)

In the line in the comment, shouldn't it be k_d = 4 ?
3 being the number of inputs in x and 4 being the number of dimension ?
My question is why is k_d different than d_model ?

Denis2054 · 2023-02-28T09:34:41Z

Denis2054
Feb 28, 2023
Maintainer

Dear Xavier, Thank you for your message and question. I'll check this and get back to you in a few days (I'm traveling right now). In the meantime, you can continue because it doesn't change the reasoning behind this small example. Best regards, Denis

…

On Tue, Feb 28, 2023, 6:15 AM xavier ***@***.***> wrote: Hi, Thanks for providing notebook aside of your books - Just bought it a few days ago and loving it. One question from the Multi_Head_attention notebook on CH02: print("Step 4: Scaled Attention Scores") k_d=1 #square root of k_d=3 rounded down to 1 for this example attention_scores = (Q @ K.transpose())/k_d print(attention_scores) In the line in the comment, shouldn't it be k_d = 4 ? 3 being the number of inputs in x and 4 being the number of dimension ? — Reply to this email directly, view it on GitHub <#3>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHLCIVQ77JS3KP6VTNID5T3WZWCVHANCNFSM6AAAAAAVKID2C4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.*** com>

0 replies

Denis2054 · 2023-02-28T10:47:31Z

Denis2054
Feb 28, 2023
Maintainer

Dear Xavier, I managed to access the code. The code is OK. However, You're right. The code shows that the k dimensions were simplified to 1 for the example. I'll update the comment in the next few days. Best regards, Denis On Tue, Feb 28, 2023, 10:34 AM Denis Rothman ***@***.***> wrote:

…

Dear Xavier, Thank you for your message and question. I'll check this and get back to you in a few days (I'm traveling right now). In the meantime, you can continue because it doesn't change the reasoning behind this small example. Best regards, Denis On Tue, Feb 28, 2023, 6:15 AM xavier ***@***.***> wrote: > Hi, > > Thanks for providing notebook aside of your books - Just bought it a few > days ago and loving it. > > One question from the Multi_Head_attention notebook on CH02: > > print("Step 4: Scaled Attention Scores") > k_d=1 #square root of k_d=3 rounded down to 1 for this example > attention_scores = (Q @ K.transpose())/k_d > print(attention_scores) > > In the line in the comment, shouldn't it be k_d = 4 ? > 3 being the number of inputs in x and 4 being the number of dimension ? > > — > Reply to this email directly, view it on GitHub > <#3>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AHLCIVQ77JS3KP6VTNID5T3WZWCVHANCNFSM6AAAAAAVKID2C4> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.*** > .com> >

1 reply

Denis2054 Mar 1, 2023
Maintainer

I updated the comment which is now "simplified" instead of "rounded down"
Thanks for pointing this out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k_d dimension #3

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

k_d dimension #3

xpivan Feb 28, 2023

Replies: 2 comments · 1 reply

Denis2054 Feb 28, 2023 Maintainer

Denis2054 Feb 28, 2023 Maintainer

Denis2054 Mar 1, 2023 Maintainer

xpivan
Feb 28, 2023

Replies: 2 comments 1 reply

Denis2054
Feb 28, 2023
Maintainer

Denis2054
Feb 28, 2023
Maintainer

Denis2054 Mar 1, 2023
Maintainer