Skip to content

Commit

Permalink
Add partial RMSnorm "pRMSNorm" variation
Browse files Browse the repository at this point in the history
This was described in the RMSNorm paper as being able to accomplish the
same task as RMSNorm usually with only performing calculations on the
first 6% of entries.

This is because the average of the RMSNorm changes more slowly the more
items are added, and the RMSNorm authors noted that the tokens they
measured had around the same value.
  • Loading branch information
gkielian committed Apr 18, 2024
1 parent 901e40e commit fa56821
Showing 1 changed file with 17 additions and 0 deletions.
17 changes: 17 additions & 0 deletions explorations/normalization_sweep.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[
{
"max_iters": ["3500"],
"n_layer": ["6"],
"n_kv_group": ["6"],
"n_head": ["6"],
"n_embd": ["384"],
"block_size":["256"],
"layernorm_variant" : ["rmsnorm", "layernorm", "prmsnorm"],
"device": ["cuda"],
"dtype": ["float16"],
"dataset": ["shakespeare_char"],
"compile": [true],
"softmax_variant_attn": ["softmax", "polymax", "saturatingconsmax"]
}
]

0 comments on commit fa56821

Please sign in to comment.