Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New conversion script #545

Merged
merged 1 commit into from
Apr 14, 2023
Merged

New conversion script #545

merged 1 commit into from
Apr 14, 2023

Commits on Apr 14, 2023

  1. New conversion script (ggerganov#545)

      Current status: Working, except for the latest GPTQ-for-LLaMa format
      that includes `g_idx`.  This turns out to require changes to GGML, so
      for now it only works if you use the `--outtype` option to dequantize it
      back to f16 (which is pointless except for debugging).
    
      I also included some cleanup for the C++ code.
    
      This script is meant to replace all the existing conversion scripts
      (including the ones that convert from older GGML formats), while also
      adding support for some new formats.  Specifically, I've tested with:
    
      - [x] `LLaMA` (original)
      - [x] `llama-65b-4bit`
      - [x] `alpaca-native`
      - [x] `alpaca-native-4bit`
      - [x] LLaMA converted to 'transformers' format using
            `convert_llama_weights_to_hf.py`
      - [x] `alpaca-native` quantized with `--true-sequential --act-order
            --groupsize 128` (dequantized only)
      - [x] same as above plus `--save_safetensors`
      - [x] GPT4All
      - [x] stock unversioned ggml
      - [x] ggmh
    
      There's enough overlap in the logic needed to handle these different
      cases that it seemed best to move to a single script.
    
      I haven't tried this with Alpaca-LoRA because I don't know where to find
      it.
    
      Useful features:
    
      - Uses multiple threads for a speedup in some cases (though the Python
        GIL limits the gain, and sometimes it's disk-bound anyway).
    
      - Combines split models into a single file (both the intra-tensor split
        of the original and the inter-tensor split of 'transformers' format
        files).  Single files are more convenient to work with and more
        friendly to future changes to use memory mapping on the C++ side.  To
        accomplish this without increasing memory requirements, it has some
        custom loading code which avoids loading whole input files into memory
        at once.
    
      - Because of the custom loading code, it no longer depends in PyTorch,
        which might make installing dependencies slightly easier or faster...
        although it still depends on NumPy and sentencepiece, so I don't know
        if there's any meaningful difference.  In any case, I also added a
        requirements.txt file to lock the dependency versions in case of any
        future breaking changes.
    
      - Type annotations checked with mypy.
    
      - Some attempts to be extra user-friendly:
    
          - The script tries to be forgiving with arguments, e.g. you can
            specify either the model file itself or the directory containing
            it.
    
          - The script doesn't depend on config.json / params.json, just in
            case the user downloaded files individually and doesn't have those
            handy.  But you still need tokenizer.model and, for Alpaca,
            added_tokens.json.
    
          - The script tries to give a helpful error message if
            added_tokens.json is missing.
    comex committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    241065e View commit details
    Browse the repository at this point in the history