You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you and your team for your work; this is excellent research. However, I have a question to ask: In the original model, the queries (q), keys (k), and values (v) all pass through linear layers, but in your newly designed model, the v values are obtained without going through a linear layer. Why was this design choice made? Are there any related ablation experiments? Thank you for your reply.
The text was updated successfully, but these errors were encountered:
Thank you and your team for your work; this is excellent research. However, I have a question to ask: In the original model, the queries (q), keys (k), and values (v) all pass through linear layers, but in your newly designed model, the v values are obtained without going through a linear layer. Why was this design choice made? Are there any related ablation experiments? Thank you for your reply.
The text was updated successfully, but these errors were encountered: