Replies: 2 comments 1 reply
-
Dear Xavier,
Thank you for your message and question.
I'll check this and get back to you in a few days (I'm traveling right
now).
In the meantime, you can continue because it doesn't change the reasoning
behind this small example.
Best regards,
Denis
…On Tue, Feb 28, 2023, 6:15 AM xavier ***@***.***> wrote:
Hi,
Thanks for providing notebook aside of your books - Just bought it a few
days ago and loving it.
One question from the Multi_Head_attention notebook on CH02:
print("Step 4: Scaled Attention Scores")
k_d=1 #square root of k_d=3 rounded down to 1 for this example
attention_scores = (Q @ K.transpose())/k_d
print(attention_scores)
In the line in the comment, shouldn't it be k_d = 4 ?
3 being the number of inputs in x and 4 being the number of dimension ?
—
Reply to this email directly, view it on GitHub
<#3>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHLCIVQ77JS3KP6VTNID5T3WZWCVHANCNFSM6AAAAAAVKID2C4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Dear Xavier,
I managed to access the code.
The code is OK.
However, You're right.
The code shows that the k dimensions were simplified to 1 for the example.
I'll update the comment in the next few days.
Best regards,
Denis
On Tue, Feb 28, 2023, 10:34 AM Denis Rothman ***@***.***>
wrote:
… Dear Xavier,
Thank you for your message and question.
I'll check this and get back to you in a few days (I'm traveling right
now).
In the meantime, you can continue because it doesn't change the reasoning
behind this small example.
Best regards,
Denis
On Tue, Feb 28, 2023, 6:15 AM xavier ***@***.***> wrote:
> Hi,
>
> Thanks for providing notebook aside of your books - Just bought it a few
> days ago and loving it.
>
> One question from the Multi_Head_attention notebook on CH02:
>
> print("Step 4: Scaled Attention Scores")
> k_d=1 #square root of k_d=3 rounded down to 1 for this example
> attention_scores = (Q @ K.transpose())/k_d
> print(attention_scores)
>
> In the line in the comment, shouldn't it be k_d = 4 ?
> 3 being the number of inputs in x and 4 being the number of dimension ?
>
> —
> Reply to this email directly, view it on GitHub
> <#3>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AHLCIVQ77JS3KP6VTNID5T3WZWCVHANCNFSM6AAAAAAVKID2C4>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***
> .com>
>
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
Thanks for providing notebook aside of your books - Just bought it a few days ago and loving it.
One question from the Multi_Head_attention notebook on CH02:
print("Step 4: Scaled Attention Scores")
k_d=1 #square root of k_d=3 rounded down to 1 for this example
attention_scores = (Q @ K.transpose())/k_d
print(attention_scores)
In the line in the comment, shouldn't it be k_d = 4 ?
3 being the number of inputs in x and 4 being the number of dimension ?
My question is why is k_d different than d_model ?
Beta Was this translation helpful? Give feedback.
All reactions