Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge with llamacpp master #2

Merged
merged 21 commits into from
Jan 20, 2024
Merged

Merge with llamacpp master #2

merged 21 commits into from
Jan 20, 2024

Commits on Jan 18, 2024

  1. metal : fix memory leak, dangling pointer and unused autorel (#5007)

    * Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute
    
    * SPM header potential fix
    
    * Reverting symlinks
    ptsochantaris authored Jan 18, 2024
    Configuration menu
    Copy the full SHA
    1e605f4 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    dcad445 View commit details
    Browse the repository at this point in the history
  3. Add Winogrande evaluation (#5015)

    * winogrande: simple implementation
    
    It doesn't look like it is working - why?
    For Mistral-7B it is barely better than
    random chance (score ~60% for 1267 tasks), while I see
    Mistral-7B scoring 78.4% on the HF leader board.
    1-sigma statistical uncertainty for 1267 tasks is ~1.4,
    so no way the difference is due to statistics.
    
    * winogrande: somewhat better
    
    Score for Mistrali7-B is now 68.9 on the validation set of
    winogrande_debiased. Still far from the reported 78.4, but
    better than what I had before.
    
    * winogrande: improving
    
    Mistral-7B score is now 73.56.
    Still not quite 78.4 but getting there.
    We are also getting a lower score on HellaSwag
    compared to HF leader board, so I'm not expecting
    we will get up to 78.4 anyway.
    
    It looks like it is better to skip the choice word(s)
    when evaluating the average log-likelihood. This kind of
    makes sense because a more common word (in Winogrande this is
    often a name) will have a higher probability without knowing
    about the follow up context, and this will skew the log-likelihood
    towards the more common word. We can only do this if the
    choice words are not last in the sentence.
    
    It also looks like it is better to skip the punctuation at the
    end of the sentence, provided the choice words are not last.
    
    * winogrande: add dataset instructions
    
    ---------
    
    Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
    ikawrakow and Kawrakow authored Jan 18, 2024
    Configuration menu
    Copy the full SHA
    682986a View commit details
    Browse the repository at this point in the history
  4. perplexity : faster HellaSwag via batching (#5017)

    * perplexity : faster HellaSwag
    
    ggml-ci
    
    * perplexity : clean-up
    
    ggml-ci
    
    * perplexity : no need for decode_helper
    
    ggml-ci
    
    * perplexity : add comments
    
    * perplexity : option to specify max batched tasks via `n_parallel`
    
    * perplexity : remove HellaSwag restruction for n_batch
    ggerganov authored Jan 18, 2024
    Configuration menu
    Copy the full SHA
    ad19812 View commit details
    Browse the repository at this point in the history
  5. HellaSwag: speed up by parallelizing log-prob evaluation (#5020)

    For Mistral-7B and fp16, time on my system goes down from 536 seconds
    to 423 seconds for the full evaluation dataset (10042 tasks).
    
    Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
    ikawrakow and Kawrakow authored Jan 18, 2024
    Configuration menu
    Copy the full SHA
    3e945cc View commit details
    Browse the repository at this point in the history
  6. convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#5019)

    PR #4818 (merged last week) reintroduced a config check for vocab_size that was addressed in PR #4258 (merged 2023-11-30).
    
    Without the fix, llama2 models can't be converted. The error is:
    
    `ValueError: The model's vocab size is set to -1 in params.json. Please update it manually. Maybe 32000?`
    databyte authored Jan 18, 2024
    Configuration menu
    Copy the full SHA
    b467577 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    e9240cd View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    d391ae9 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    2d5419d View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    96d7f56 View commit details
    Browse the repository at this point in the history
  11. server : defer tasks when "slot unavailable" (#5018)

    * server: defer task when no slot is available
    
    * remove unnecessary log
    
    ---------
    
    Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>
    ngxson and Xuan Son Nguyen authored Jan 18, 2024
    Configuration menu
    Copy the full SHA
    821f0a2 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    9b6ea42 View commit details
    Browse the repository at this point in the history
  13. llama : fix falcon arch for tied output embeddings (#4978)

    * falcon arch fix for tied output embeddings
    
    * Update llama.cpp
    
    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
    
    * Update llama.cpp
    
    * Update llama.cpp
    
    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
    
    * Update llama.cpp
    
    ---------
    
    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
    cmp-nct and ggerganov authored Jan 18, 2024
    Configuration menu
    Copy the full SHA
    57e2a7a View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2024

  1. perplexity : faster Winogrande via batching (#5024)

    * perplexity : faster Winogrande via batching
    
    ggml-ci
    
    * perplexity : remove unused function
    
    * perplexity : only tokenize selected tasks for Winogrande
    ggerganov authored Jan 19, 2024
    Configuration menu
    Copy the full SHA
    8b20858 View commit details
    Browse the repository at this point in the history
  2. perplexity: avoid unnecessary alloocations and logit copies (#5035)

    Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
    ikawrakow and Kawrakow authored Jan 19, 2024
    Configuration menu
    Copy the full SHA
    993fba8 View commit details
    Browse the repository at this point in the history
  3. llama : add CodeShell support (#5016)

    * llama: add codeshell support
    
    * llama.cpp: fix codeshell with NeoX rope
    
    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
    
    ---------
    
    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
    chiranko and ggerganov authored Jan 19, 2024
    Configuration menu
    Copy the full SHA
    2b3b999 View commit details
    Browse the repository at this point in the history
  4. winogrande: evaluate log-probs in parallel (#5036)

    This is a relatively minor performance tweak resulting in
    ~10% speedup on my system.
    
    Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
    ikawrakow and Kawrakow authored Jan 19, 2024
    Configuration menu
    Copy the full SHA
    7051aac View commit details
    Browse the repository at this point in the history
  5. py : fix flake8 lint

    ggerganov committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    de9a147 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    9b75cb2 View commit details
    Browse the repository at this point in the history
  7. imatrix : add README.md

    ggerganov authored Jan 19, 2024
    Configuration menu
    Copy the full SHA
    a5cacb2 View commit details
    Browse the repository at this point in the history
  8. finetune : fix ggml_allocr lifetimes (tmp workaround) (#5033)

    * Fix issue with alloc causing max_compute_size to be calculated
    
    * remove ggml_allocr_free as suggested in issue #4791
    bzuzo authored Jan 19, 2024
    Configuration menu
    Copy the full SHA
    381ee19 View commit details
    Browse the repository at this point in the history