-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YAML logging and presets #2657
YAML logging and presets #2657
Conversation
I implemented support for the perplexity binary. I am logging the individual token probabilities because I suspect that just looking at the average negative log-likelihood of the token probabilities is a suboptimal metric for judging quantization quality loss. This is a simple plot where I just histogrammed the probabilities: The corresponding code is:
Even in this very simple analysis it becomes apparent that there are two clusters in terms of token probabilities: one close to 0 and one close to 1. I would argue that the cluster close to 0 is largely irrelevant because the model is essentially guessing and it doesn't matter whether it's correct 1% of the time or 0.1% of the time. However, due to the way the math works out this is equivalent to a change in probability from 50% to 5% or 100% to 10% which would mean a much larger loss of quality. I'm not yet 100% sure what a better metric would be but I think part of it should be to ignore the perplexity of those tokens where the unquantized model was already performing extremely poorly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some implementation related comments that I would like to get addressed before merging
Not directly related to the PR but making another post because I find the results to be interesting. I made another histogram where I weighted the frequency of the probabilities with their negative log-likelihoods: This is the corresponding code:
As it turns out the absolute perplexity value is overwhelmingly dominated by low-probability tokens and the metric is thus most sensitive to small absolute probability changes for those tokens. Edit: the binning in this histogram is chosen in such a way that the bin height is equivalent to the percentage of the contribution. |
e0f5140
to
233f7cd
Compare
Using the code in this PR I think I've found a good metric for estimating the precision loss from quantization: the root mean square of the token probabilities relative to the unquantized model. This metric is sensitive to those probabilities not close to 0 or 1 (which seem to be largely unaffected by quantization): There is still a lot that would need to be done for this PR but I could instead focus on adding this metric to the perplexity binary. But although I would try my best to make a good case for it in the corresponding PR it would require some degree of just trusting that I did everything correctly if my data is based on unmerged code. |
233f7cd
to
e529aea
Compare
This PR should now be feature complete except for directory creation on Windows. I'll probably get to it on Friday. Edit: Logging of the hellaswag score is not implemented and I'll still need to ensure that I didn't break anything. Running |
It appears specifying a path is also required & no logging occurs if unset. I want a yaml file produced, but I'm unclear on how to do so. I figured |
This comment was marked as outdated.
This comment was marked as outdated.
I'm trying to understand, but I don't understand sdterr or what you mean by it. there's no warnings that I see, Here'a my attempt: Other than adding |
The log only saves if main completes normally, not if you kill it with CTRL+C. |
😅 whoops. I guess I'll lower the |
I get a segfault if I don't pass a prompt, because prompt_tokens is empty and dump_vector_int_yaml assumes all vectors have at least one element. |
I think newest commit helps with that by adding a Even if it stopped when it reaches max context, I have to kill it to end the program.. but also, it's not respecting
|
@JackJollimore it seems I misinterpreted what the code does. I saw the timings when you interrupt with CTRL+C and was assuming it would jump to the end of the program where I also added the logging code. I'll fix the generation of YAML files on interrupt on Friday.
You need to set |
I wanted to ask if interrupt generations could be done, so that's great. Thanks for a PR like this.
Bug with Please merge master into this PR when you get a chance. |
6094648
to
f18cada
Compare
Alright, this should now be mostly good to merge from my end (I still need to test it some more). |
Working:
Full generation. |
examples/perplexity/perplexity.cpp
Outdated
} | ||
|
||
void perplexity(llama_context * ctx, const gpt_params & params) { | ||
std::tuple<std::vector<llama_token>, std::vector<float>, std::vector<float>, float> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find std::tuple
syntax to be quite cumbersome and unreadable, so I normally avoid it by using a simple struct
.
Can we change these to struct results_ppl
and earlier to struct results_log_softmax
?
f18cada
to
6a30dff
Compare
This PR adds the ability to log all input and output parameters to YAML files as well as a Python script
run_with_preset.py
that takes a YAML file as input and uses it to run the llama.cpp binaries with the specified CLI arguments. The usage ispython run_with_preset.py path/to/preset/file.yml
. Currently only one preset file can be specified. I plan to make it so that you will be able to specify additional CLI arguments to the Python script that override the presets. This PR is intended as a replacement for #2557 and it has vaguely similar uses as #2626 . By specifying the CLI argument--logdir
, a YAML file such as this can be created:This YAML file could then be used as input for
run_with_preset.py
to reproduce the generation or a custom YAML file could be written that specifies only a subset of all CLI arguments, for example:This PR is still very much WIP, for now only the main binary is supported and I still need to do more testing. I made the following design decisions for the implementation:
--logdir
CLI argument is set.cpu_has_blas
would make sense to move into the header.<YEAR>-<MONTH>-<DAY>T<HOURS>:<MINUTES>:<SECONDS>.<NANOSECONDS>
. The timestamps are also used as filenames. I chose the ordering of month before day in order for alphabetical sorting to align with the temporal order. I added the nanoseconds to make it very unlikely for two processes to write to the same file.std::experimental::filesystem
to create the log directory if it does not exist and to construct the path to the output file in an OS-agnostic manner. In later C++ versions this functionality has become part ofstd
.gpt_params
property for escaping special characters in the prompt.