Fix OLMo HF to GGUF conversion #6910

nopperl · 2024-04-25T15:40:54Z

Fix the HF to GGUF conversion of OLMo models:

Implement OLMo support for the BPE pre-tokenization (llama : improve BPE pre-processing + LLaMA 3 and Deepseek support #6920).
Properly set the clamp_qkv value. Fixes truly opensource model called olmo #6712.

josharian · 2024-04-29T22:03:19Z

I found this PR via #6712, which I am also experiencing. I patched this PR in and got a new failure. llama.cpp version b8c1476 (head as of right now).

$ python convert-hf-to-gguf.py OLMo-7B-hf --outfile olmo-7b
Loading model: OLMo-7B-hf
gguf: This GGUF file is for Little Endian only
Set model parameters
gguf: context length = 2048
gguf: embedding length = 4096
gguf: feed forward length = 11008
gguf: head count = 32
gguf: key-value head count = 32
gguf: rope theta = 10000.0
gguf: file type = 1
Set model tokenizer
chktok: [586, 1744, 33525, 186, 209, 623, 28910, 187, 50276, 187, 50275, 187, 50274, 187, 50273, 187, 14931, 237, 211, 313, 6320, 10, 49042, 116, 325, 224, 14931, 223, 106, 171, 118, 226, 313, 34263, 802, 13511, 261, 32147, 456, 10, 3384, 239, 216, 22692, 101, 236, 14931, 101, 236, 495, 5922, 30057, 495, 20084, 495, 26409, 30057, 20084, 495, 26409, 1610, 495, 26409, 20084, 495, 15, 20, 495, 537, 20, 495, 1051, 20, 209, 18081, 211, 18081, 116, 18081, 230, 39936, 222, 18081, 226, 39936, 213, 18081, 233, 18081, 117, 18081, 242, 39936, 212, 18081, 242, 18081, 97, 18081, 116, 18081, 216, 14931, 235, 212, 3736, 15367, 41197, 13610, 19934, 41869, 21275, 1012, 1047, 18795, 40120, 20422, 241, 16081, 6877, 12880, 11514, 1068, 8713, 38177, 13396, 3415, 9925, 12559, 10453, 1389, 42011, 35033, 34842, 11202, 9739, 9739, 33021, 18963, 4672, 25561, 8220, 309, 1849, 644, 686, 42618, 344, 434, 627, 13, 686, 1848, 368, 2119, 32, 686, 46, 417, 2119, 309, 1833, 1056, 352, 13, 686, 37, 368, 751, 690, 10331, 32, 844, 8, 31516, 247, 8, 77, 45, 50279]
chkhsh: 252ad757e225d729882d4763e69f762dc6311bb819eb2c0288817e7bbe9b99d9


**************************************************************************************
** WARNING: The BPE pre-tokenizer was not recognized!
**          This means that it was not added yet or you are using an older version.
**          Check convert-hf-to-gguf-update.py and update it accordingly.
**
** chkhsh:  252ad757e225d729882d4763e69f762dc6311bb819eb2c0288817e7bbe9b99d9
**************************************************************************************


Traceback (most recent call last):
  File "/Users/josh/x/llama.cpp/convert-hf-to-gguf.py", line 3569, in <module>
    main()
  File "/Users/josh/x/llama.cpp/convert-hf-to-gguf.py", line 3556, in main
    model_instance.set_vocab()
  File "/Users/josh/x/llama.cpp/convert-hf-to-gguf.py", line 103, in set_vocab
    self._set_vocab_gpt2()
  File "/Users/josh/x/llama.cpp/convert-hf-to-gguf.py", line 418, in _set_vocab_gpt2
    tokens, toktypes, tokpre = self.get_vocab_base()
                               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/josh/x/llama.cpp/convert-hf-to-gguf.py", line 321, in get_vocab_base
    tokpre = self.get_vocab_base_pre(tokenizer)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/josh/x/llama.cpp/convert-hf-to-gguf.py", line 408, in get_vocab_base_pre
    raise NotImplementedError(
NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()

nopperl · 2024-04-29T22:24:41Z

It seems like the error comes from the BPE pre-tokenization merged in #6920.

nopperl · 2024-05-05T19:17:50Z

@josharian I have fixed the conversion issue, it should work now.

github-actions · 2024-05-05T19:33:01Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 563 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8261.22ms p(95)=19563.9ms fails=, finish reason: stop=494 truncated=69
Prompt processing (pp): avg=88.17tk/s p(95)=354.58tk/s
Token generation (tg): avg=33.34tk/s p(95)=48.04tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=fix-olmo-conversion commit=25be8f5cd5c9ea500d10588ae90c1a51816ad066

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 563 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715113189 --> 1715113817
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 749.46, 749.46, 749.46, 749.46, 749.46, 623.29, 623.29, 623.29, 623.29, 623.29, 651.35, 651.35, 651.35, 651.35, 651.35, 681.78, 681.78, 681.78, 681.78, 681.78, 725.21, 725.21, 725.21, 725.21, 725.21, 727.81, 727.81, 727.81, 727.81, 727.81, 744.98, 744.98, 744.98, 744.98, 744.98, 750.97, 750.97, 750.97, 750.97, 750.97, 766.58, 766.58, 766.58, 766.58, 766.58, 771.79, 771.79, 771.79, 771.79, 771.79, 780.26, 780.26, 780.26, 780.26, 780.26, 822.55, 822.55, 822.55, 822.55, 822.55, 870.34, 870.34, 870.34, 870.34, 870.34, 834.98, 834.98, 834.98, 834.98, 834.98, 846.6, 846.6, 846.6, 846.6, 846.6, 846.33, 846.33, 846.33, 846.33, 846.33, 859.94, 859.94, 859.94, 859.94, 859.94, 864.82, 864.82, 864.82, 864.82, 864.82, 863.76, 863.76, 863.76, 863.76, 863.76, 869.83, 869.83, 869.83, 869.83, 869.83, 871.76, 871.76, 871.76, 871.76, 871.76, 884.48, 884.48, 884.48, 884.48, 884.48, 888.7, 888.7, 888.7, 888.7, 888.7, 889.94, 889.94, 889.94, 889.94, 889.94, 890.97, 890.97, 890.97, 890.97, 890.97, 860.6, 860.6, 860.6, 860.6, 860.6, 857.91, 857.91, 857.91, 857.91, 857.91, 856.88, 856.88, 856.88, 856.88, 856.88, 858.43, 858.43, 858.43, 858.43, 858.43, 862.41, 862.41, 862.41, 862.41, 862.41, 860.39, 860.39, 860.39, 860.39, 860.39, 864.95, 864.95, 864.95, 864.95, 864.95, 869.16, 869.16, 869.16, 869.16, 869.16, 857.43, 857.43, 857.43, 857.43, 857.43, 867.76, 867.76, 867.76, 867.76, 867.76, 867.47, 867.47, 867.47, 867.47, 867.47, 866.0, 866.0, 866.0, 866.0, 866.0, 867.94, 867.94, 867.94, 867.94, 867.94, 867.74, 867.74, 867.74, 867.74, 867.74, 873.33, 873.33, 873.33, 873.33, 873.33, 855.92, 855.92, 855.92, 855.92, 855.92, 813.69, 813.69, 813.69, 813.69, 813.69, 812.38, 812.38, 812.38, 812.38, 812.38, 811.12, 811.12, 811.12, 811.12, 811.12, 813.66, 813.66, 813.66, 813.66, 813.66, 817.68, 817.68, 817.68, 817.68, 817.68, 819.13, 819.13, 819.13, 819.13, 819.13, 823.36, 823.36, 823.36, 823.36, 823.36, 827.52, 827.52, 827.52, 827.52, 827.52, 820.98, 820.98, 820.98, 820.98, 820.98, 821.03, 821.03, 821.03, 821.03, 821.03, 827.54, 827.54, 827.54, 827.54, 827.54, 827.35, 827.35, 827.35, 827.35, 827.35, 829.53, 829.53, 829.53, 829.53, 829.53, 830.26, 830.26, 830.26, 830.26, 830.26, 830.47, 830.47, 830.47, 830.47, 830.47, 832.55, 832.55, 832.55, 832.55, 832.55, 835.54, 835.54, 835.54, 835.54, 835.54, 835.97, 835.97, 835.97, 835.97, 835.97, 834.57, 834.57, 834.57, 834.57, 834.57]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 563 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715113189 --> 1715113817
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 33.94, 33.94, 33.94, 33.94, 33.94, 34.01, 34.01, 34.01, 34.01, 34.01, 30.75, 30.75, 30.75, 30.75, 30.75, 32.37, 32.37, 32.37, 32.37, 32.37, 33.12, 33.12, 33.12, 33.12, 33.12, 34.23, 34.23, 34.23, 34.23, 34.23, 35.46, 35.46, 35.46, 35.46, 35.46, 35.84, 35.84, 35.84, 35.84, 35.84, 36.1, 36.1, 36.1, 36.1, 36.1, 35.82, 35.82, 35.82, 35.82, 35.82, 35.55, 35.55, 35.55, 35.55, 35.55, 35.25, 35.25, 35.25, 35.25, 35.25, 34.19, 34.19, 34.19, 34.19, 34.19, 33.35, 33.35, 33.35, 33.35, 33.35, 33.01, 33.01, 33.01, 33.01, 33.01, 33.1, 33.1, 33.1, 33.1, 33.1, 33.19, 33.19, 33.19, 33.19, 33.19, 32.69, 32.69, 32.69, 32.69, 32.69, 32.67, 32.67, 32.67, 32.67, 32.67, 32.63, 32.63, 32.63, 32.63, 32.63, 32.65, 32.65, 32.65, 32.65, 32.65, 32.79, 32.79, 32.79, 32.79, 32.79, 32.72, 32.72, 32.72, 32.72, 32.72, 32.73, 32.73, 32.73, 32.73, 32.73, 32.83, 32.83, 32.83, 32.83, 32.83, 32.85, 32.85, 32.85, 32.85, 32.85, 32.44, 32.44, 32.44, 32.44, 32.44, 32.24, 32.24, 32.24, 32.24, 32.24, 32.42, 32.42, 32.42, 32.42, 32.42, 32.55, 32.55, 32.55, 32.55, 32.55, 32.68, 32.68, 32.68, 32.68, 32.68, 32.8, 32.8, 32.8, 32.8, 32.8, 32.78, 32.78, 32.78, 32.78, 32.78, 32.64, 32.64, 32.64, 32.64, 32.64, 32.49, 32.49, 32.49, 32.49, 32.49, 32.26, 32.26, 32.26, 32.26, 32.26, 32.33, 32.33, 32.33, 32.33, 32.33, 32.55, 32.55, 32.55, 32.55, 32.55, 32.66, 32.66, 32.66, 32.66, 32.66, 32.82, 32.82, 32.82, 32.82, 32.82, 32.82, 32.82, 32.82, 32.82, 32.82, 32.35, 32.35, 32.35, 32.35, 32.35, 32.25, 32.25, 32.25, 32.25, 32.25, 32.13, 32.13, 32.13, 32.13, 32.13, 30.92, 30.92, 30.92, 30.92, 30.92, 30.71, 30.71, 30.71, 30.71, 30.71, 30.91, 30.91, 30.91, 30.91, 30.91, 30.92, 30.92, 30.92, 30.92, 30.92, 31.04, 31.04, 31.04, 31.04, 31.04, 31.05, 31.05, 31.05, 31.05, 31.05, 31.05, 31.05, 31.05, 31.05, 31.05, 30.86, 30.86, 30.86, 30.86, 30.86, 30.92, 30.92, 30.92, 30.92, 30.92, 31.04, 31.04, 31.04, 31.04, 31.04, 31.23, 31.23, 31.23, 31.23, 31.23, 31.28, 31.28, 31.28, 31.28, 31.28, 31.36, 31.36, 31.36, 31.36, 31.36, 31.4, 31.4, 31.4, 31.4, 31.4, 31.38, 31.38, 31.38, 31.38, 31.38, 31.39, 31.39, 31.39, 31.39, 31.39]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 563 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715113189 --> 1715113817
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.22, 0.22, 0.22, 0.22, 0.22, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.23, 0.23, 0.23, 0.23, 0.23, 0.27, 0.27, 0.27, 0.27, 0.27, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.28, 0.28, 0.28, 0.28, 0.28, 0.19, 0.19, 0.19, 0.19, 0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.22, 0.22, 0.22, 0.22, 0.13, 0.13, 0.13, 0.13, 0.13, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.31, 0.31, 0.31, 0.31, 0.31, 0.21, 0.21, 0.21, 0.21, 0.21, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.19, 0.19, 0.19, 0.19, 0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.3, 0.3, 0.3, 0.3, 0.3, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.12, 0.12, 0.12, 0.12, 0.12, 0.23, 0.23, 0.23, 0.23, 0.23, 0.47, 0.47, 0.47, 0.47, 0.47, 0.5, 0.5, 0.5, 0.5, 0.5, 0.53, 0.53, 0.53, 0.53, 0.53, 0.59, 0.59, 0.59, 0.59, 0.59, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.14, 0.14, 0.14, 0.14, 0.14]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 563 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715113189 --> 1715113817
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0, 1.0, 1.0, 1.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0]

lsetiawan · 2024-05-07T16:26:47Z

@nopperl Thank you for providing this fix. I can confirm that this works. Could someone please merge this PR? I would like to share this capability at the 2024 Scipy Conference

ggerganov · 2024-05-07T17:57:00Z

Hm does it really work - llama.cpp does not know how to handle the "olmo" pre-tokenizer. It would crash here:

llama.cpp/llama.cpp

Lines 4392 to 4394 in 3af34c1

    
           } else { 
        
               throw std::runtime_error(format("unknown pre-tokenizer type: '%s'", tokenizer_pre.c_str())); 
        
           }

nopperl · 2024-05-07T18:27:44Z

@ggerganov you're right, I tested it with an older binary. I'll try to fix it.

Galunid · 2024-05-07T19:01:46Z

It looks alright, I'm downloading the model to test it. If it works I'll merge it, unless there's something more you want to add here?

nopperl · 2024-05-07T19:05:26Z

It looks alright, I'm downloading the model to test it. If it works I'll merge it, unless there's something more you want to add here?

nice, I don't think there's anything else to add if it works.

Galunid

Looks good

lsetiawan · 2024-05-07T20:45:53Z

Wow! Thank you all for your super quick response and @nopperl for working to fix this. I really appreciate everyone's input 😄 This is really exciting!

nopperl · 2024-05-07T21:11:36Z

@lsetiawan no problem!

I would like to share this capability at the 2024 Scipy Conference

I'm interested in that, could you send me more info on what you're planning to do?

lsetiawan · 2024-05-08T00:35:28Z

For sure! For anyone interested, my team at the University of Washington Scientific Software Engineering Center has been working on creating a tutorial on RAG-based approach using OLMo as the LLM Model. Since the regular OLMo-7B-Instruct model has a very slow inference speed, we've been looking into ways to quantize and speed things up, especially on CPU, so that's how we've come across llama.cpp and the progress made with integrating OLMo so that it can be converted to GGUF and quantized. Thanks to everyone's contribution we have successfully made the OLMo-7B-Instruct to be in GGUF format and quantized to 4-bit with Q4_K_M method: https://huggingface.co/ssec-uw/OLMo-7B-Instruct-GGUF

nopperl · 2024-05-08T11:30:38Z

@lsetiawan very interesting, nice to see that this contribution is useful to others.

Also great that you were able to convert the instruct model to HF format, which should be more useful for most users. However, I don't think the conversion works properly because it's missing tokenizer.json and tokenizer_config.json. You should be able to use these from allenai/OLMo-7B-hf. I also recommend setting the chat_template in tokenizer_config.json to the one from allenai/OLMo-7B-Instruct, so it can be used automatically.

fix setting clamp_qkv value in OLMo conversion

00f3fb6

nopperl mentioned this pull request Apr 25, 2024

truly opensource model called olmo #6712

Closed

cebtenzzre approved these changes Apr 25, 2024

View reviewed changes

nopperl force-pushed the fix-olmo-conversion branch 2 times, most recently from d49e252 to 00f3fb6 Compare May 5, 2024 19:07

nopperl added 3 commits May 5, 2024 21:10

add OLMo to pre-tokenizer generation script

c807683

Merge branch 'update-olmo-tokenizer' into fix-olmo-conversion

da41960

add OLMo to pre-tokenizer function

6943250

nopperl changed the title ~~Properly set clamp_qkv value in OLMo conversion~~ Fix OLMo HF to GGUF conversion May 5, 2024

implment olmo pre-tokenizer type in llama.cpp

25be8f5

Galunid approved these changes May 7, 2024

View reviewed changes

Galunid merged commit b6aa670 into ggerganov:master May 7, 2024
56 of 61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix OLMo HF to GGUF conversion #6910

Fix OLMo HF to GGUF conversion #6910

nopperl commented Apr 25, 2024 •

edited

Loading

josharian commented Apr 29, 2024

nopperl commented Apr 29, 2024

nopperl commented May 5, 2024

github-actions bot commented May 5, 2024 •

edited

Loading

lsetiawan commented May 7, 2024

ggerganov commented May 7, 2024

nopperl commented May 7, 2024

Galunid commented May 7, 2024

nopperl commented May 7, 2024

Galunid left a comment

lsetiawan commented May 7, 2024

nopperl commented May 7, 2024

lsetiawan commented May 8, 2024

nopperl commented May 8, 2024

Fix OLMo HF to GGUF conversion #6910

Fix OLMo HF to GGUF conversion #6910

Conversation

nopperl commented Apr 25, 2024 • edited Loading

josharian commented Apr 29, 2024

nopperl commented Apr 29, 2024

nopperl commented May 5, 2024

github-actions bot commented May 5, 2024 • edited Loading

lsetiawan commented May 7, 2024

ggerganov commented May 7, 2024

nopperl commented May 7, 2024

Galunid commented May 7, 2024

nopperl commented May 7, 2024

Galunid left a comment

Choose a reason for hiding this comment

lsetiawan commented May 7, 2024

nopperl commented May 7, 2024

lsetiawan commented May 8, 2024

nopperl commented May 8, 2024

nopperl commented Apr 25, 2024 •

edited

Loading

github-actions bot commented May 5, 2024 •

edited

Loading