-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge with llamacpp master #2
Commits on Jan 18, 2024
-
metal : fix memory leak, dangling pointer and unused autorel (#5007)
* Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute * SPM header potential fix * Reverting symlinks
Configuration menu - View commit details
-
Copy full SHA for 1e605f4 - Browse repository at this point
Copy the full SHA 1e605f4View commit details -
Configuration menu - View commit details
-
Copy full SHA for dcad445 - Browse repository at this point
Copy the full SHA dcad445View commit details -
Add Winogrande evaluation (#5015)
* winogrande: simple implementation It doesn't look like it is working - why? For Mistral-7B it is barely better than random chance (score ~60% for 1267 tasks), while I see Mistral-7B scoring 78.4% on the HF leader board. 1-sigma statistical uncertainty for 1267 tasks is ~1.4, so no way the difference is due to statistics. * winogrande: somewhat better Score for Mistrali7-B is now 68.9 on the validation set of winogrande_debiased. Still far from the reported 78.4, but better than what I had before. * winogrande: improving Mistral-7B score is now 73.56. Still not quite 78.4 but getting there. We are also getting a lower score on HellaSwag compared to HF leader board, so I'm not expecting we will get up to 78.4 anyway. It looks like it is better to skip the choice word(s) when evaluating the average log-likelihood. This kind of makes sense because a more common word (in Winogrande this is often a name) will have a higher probability without knowing about the follow up context, and this will skew the log-likelihood towards the more common word. We can only do this if the choice words are not last in the sentence. It also looks like it is better to skip the punctuation at the end of the sentence, provided the choice words are not last. * winogrande: add dataset instructions --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 682986a - Browse repository at this point
Copy the full SHA 682986aView commit details -
perplexity : faster HellaSwag via batching (#5017)
* perplexity : faster HellaSwag ggml-ci * perplexity : clean-up ggml-ci * perplexity : no need for decode_helper ggml-ci * perplexity : add comments * perplexity : option to specify max batched tasks via `n_parallel` * perplexity : remove HellaSwag restruction for n_batch
Configuration menu - View commit details
-
Copy full SHA for ad19812 - Browse repository at this point
Copy the full SHA ad19812View commit details -
HellaSwag: speed up by parallelizing log-prob evaluation (#5020)
For Mistral-7B and fp16, time on my system goes down from 536 seconds to 423 seconds for the full evaluation dataset (10042 tasks). Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 3e945cc - Browse repository at this point
Copy the full SHA 3e945ccView commit details -
convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#5019)
Configuration menu - View commit details
-
Copy full SHA for b467577 - Browse repository at this point
Copy the full SHA b467577View commit details -
Configuration menu - View commit details
-
Copy full SHA for e9240cd - Browse repository at this point
Copy the full SHA e9240cdView commit details -
Configuration menu - View commit details
-
Copy full SHA for d391ae9 - Browse repository at this point
Copy the full SHA d391ae9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2d5419d - Browse repository at this point
Copy the full SHA 2d5419dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 96d7f56 - Browse repository at this point
Copy the full SHA 96d7f56View commit details -
server : defer tasks when "slot unavailable" (#5018)
* server: defer task when no slot is available * remove unnecessary log --------- Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>
Configuration menu - View commit details
-
Copy full SHA for 821f0a2 - Browse repository at this point
Copy the full SHA 821f0a2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9b6ea42 - Browse repository at this point
Copy the full SHA 9b6ea42View commit details -
llama : fix falcon arch for tied output embeddings (#4978)
* falcon arch fix for tied output embeddings * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 57e2a7a - Browse repository at this point
Copy the full SHA 57e2a7aView commit details
Commits on Jan 19, 2024
-
perplexity : faster Winogrande via batching (#5024)
* perplexity : faster Winogrande via batching ggml-ci * perplexity : remove unused function * perplexity : only tokenize selected tasks for Winogrande
Configuration menu - View commit details
-
Copy full SHA for 8b20858 - Browse repository at this point
Copy the full SHA 8b20858View commit details -
perplexity: avoid unnecessary alloocations and logit copies (#5035)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 993fba8 - Browse repository at this point
Copy the full SHA 993fba8View commit details -
llama : add CodeShell support (#5016)
* llama: add codeshell support * llama.cpp: fix codeshell with NeoX rope Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 2b3b999 - Browse repository at this point
Copy the full SHA 2b3b999View commit details -
winogrande: evaluate log-probs in parallel (#5036)
This is a relatively minor performance tweak resulting in ~10% speedup on my system. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 7051aac - Browse repository at this point
Copy the full SHA 7051aacView commit details -
Configuration menu - View commit details
-
Copy full SHA for de9a147 - Browse repository at this point
Copy the full SHA de9a147View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9b75cb2 - Browse repository at this point
Copy the full SHA 9b75cb2View commit details -
Configuration menu - View commit details
-
Copy full SHA for a5cacb2 - Browse repository at this point
Copy the full SHA a5cacb2View commit details -
finetune : fix ggml_allocr lifetimes (tmp workaround) (#5033)
* Fix issue with alloc causing max_compute_size to be calculated * remove ggml_allocr_free as suggested in issue #4791
Configuration menu - View commit details
-
Copy full SHA for 381ee19 - Browse repository at this point
Copy the full SHA 381ee19View commit details