added query-key norm to accomodate OLMo2 #1894

ysjprojects · 2024-12-28T22:14:50Z

query_states = self.q_norm(self.q_proj(hidden_states))
key_states = self.k_norm(self.k_proj(hidden_states))
value_states = self.v_proj(hidden_states)

https://github.com/huggingface/transformers/blob/main/src/transformers/models/olmo2/modeling_olmo2.py

OLMo2 applies RMSNorm to the q and k matrices in its attention layer, something that is not yet supported by litgpt's architecture.

To support the addition of OLMo2, this PR adds an option to norm the q and k matrices via the config.norm_qk option which defaults to False.

Currently, the method for qk norm is assumed to follow the overall norm class.

Andrei-Aksionov · 2024-12-30T16:21:24Z

litgpt/model.py

+            k = k.reshape(B, T, -1)  # (B, T, nh_k * hs)
+            k = self.norm_k(k)
+            k = k.view(B, T, self.config.n_query_groups, self.config.head_size)
+


Is there a reason why you do this normalization here and not right after the qkv.split?
If you move it there, you will not have to do .reshape and .view again.

added query-key norm to accomodate OLMo2

0b4629a

ysjprojects requested review from rasbt and lantiga as code owners December 28, 2024 22:14

Andrei-Aksionov reviewed Dec 30, 2024

View reviewed changes

Andrei-Aksionov and others added 3 commits December 30, 2024 19:24

Add rerun on failures for test_readme/download[model,books]

5f9df57

Merge branch 'main' into norm_qk

d1ebefa

Merge branch 'main' into norm_qk

2a434ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added query-key norm to accomodate OLMo2 #1894

added query-key norm to accomodate OLMo2 #1894

ysjprojects commented Dec 28, 2024 •

edited

Loading

Andrei-Aksionov Dec 30, 2024

added query-key norm to accomodate OLMo2 #1894

Are you sure you want to change the base?

added query-key norm to accomodate OLMo2 #1894

Conversation

ysjprojects commented Dec 28, 2024 • edited Loading

Andrei-Aksionov Dec 30, 2024

Choose a reason for hiding this comment

ysjprojects commented Dec 28, 2024 •

edited

Loading