-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More accurate Q4_0 and Q4_1 quantizations #896
Closed
Closed
Commits on Apr 11, 2023
-
in quantize_row_q4_0_reference and quantize_row_q4_1_reference. This reduces the difference to the vectorized versions to ~10% for quantize_row_q4_0 and <15% for quantize_row_q4_1 on the two CPU's I have tried (Ryzen 7950X and M2 Max).
Configuration menu - View commit details
-
Copy full SHA for 126b984 - Browse repository at this point
Copy the full SHA 126b984View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0c9a967 - Browse repository at this point
Copy the full SHA 0c9a967View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8b3d1f9 - Browse repository at this point
Copy the full SHA 8b3d1f9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 92408cd - Browse repository at this point
Copy the full SHA 92408cdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 709d235 - Browse repository at this point
Copy the full SHA 709d235View commit details -
Reverting round() change so we can pass tests
But we should eventually switch back to nearestInt() and adapt the test.
Configuration menu - View commit details
-
Copy full SHA for b6df974 - Browse repository at this point
Copy the full SHA b6df974View commit details -
Somehow I had it hard-wired in my brain that quants need to be in -7...7 to be comparable to the original Q4_0. But this is clearly not the case, and if we relax this requirement this simple change brings the rmse down to 0.001966 at the expense of a somewhat longer computation (~67 seconds vs 49 seconds for the 7B model on M2 Max). Perplexity test is still running but it looks like the improvement compared to the previous version will be quite modest ~0.03) despite the significant improvement in MSE. The change does not affect Q4_1 as there we already use the full range of 16 possible int values.
Configuration menu - View commit details
-
Copy full SHA for 931ae36 - Browse repository at this point
Copy the full SHA 931ae36View commit details
Commits on Apr 12, 2023
-
The RMSE of the 7B model becomes 0.00185228. It looks like the perplexity will end up being around 6.27-6.28.
Configuration menu - View commit details
-
Copy full SHA for 6bfb00a - Browse repository at this point
Copy the full SHA 6bfb00aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 29b83e5 - Browse repository at this point
Copy the full SHA 29b83e5View commit details -
POC: Even lower rmse 4-bit Q4_0 quantization
Basically, we use two Q4_0 quantizations, each having 16 weights, to a quantize a set of 32 weights. We get two separate scaling factors, which we store as fp16, ending up using the exact same 5 bits per weight as the current Q4_0. We end up witn an rmse of ~0.00159, so basically the same as the improved Q4_1. But this should run faster than `Q4_1` (unless fp16 -> fp32 conversion is somehow very slow).
Configuration menu - View commit details
-
Copy full SHA for 679e1cb - Browse repository at this point
Copy the full SHA 679e1cbView commit details
Commits on Apr 13, 2023
-
POC: Q4_1 for groups of 16 weight
As last commit, but Q4_1 type, using the same memory as existing Q4_1 via fp16. We end up with rmse 0.00125125, maxerr 0.11657715, 95pct<0.0024, median<0.0010 after a quantize - dequantize roundtrip. This is quite a bit better than Q4_1 with groups of 32 weights, but by far not as good as 5-bit quantization that uses the same amount of memory where we had rmse 0.00076131, maxerr 0.05273438, 95pct<0.0016, median<0.0006
Configuration menu - View commit details
-
Copy full SHA for 6f34961 - Browse repository at this point
Copy the full SHA 6f34961View commit details -
POC: Measure rmse of 8 bit quantization
q8_0 : rmse 0.00010729, maxerr 0.01030385, 95pct<0.0002, median<0.0002
Configuration menu - View commit details
-
Copy full SHA for 97d7ac7 - Browse repository at this point
Copy the full SHA 97d7ac7View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.