SHA256 checksums correctness #374

anzz1 · 2023-03-21T23:05:19Z

Not all of these checksums seem to be correct. Are they calculated with the "v2" new model format after the tokenizer change? PR: #252 Issue: #324

For example, "models/alpaca-7B/ggml-model-q4_0.bin"

v1: 1f582babc2bd56bb63b33141898748657d369fd110c4358b2bc280907882bf13
v2: 8d5562ec1d8a7cfdcf8985a9ddf353339d942c7cf52855a92c9ff59f03b541bc

The SHA256SUMS file has the old v1 hash.
Maybe using a naming scheme like "ggml2-model-q4_0.bin" would be good to differentiate between the versions and avoid confusion.

Originally posted by @anzz1 in #338 (comment)

edit: After converting the models to the new format, I found out that the "v2" hash above is also incorrect.
The sha256 for ./models/alpaca-7B-ggml/ggml-model-q4_0.bin is supposed to be 2fe0cd21df9c235c0d917c14e1b18d2d7320ed5d8abe48545518e96bb4227524

The text was updated successfully, but these errors were encountered:

gjmulder · 2023-03-22T11:02:55Z

I'm still in the process of finding/converting the 7B and 13B alpaca models to ggml2

I'll then recompute all the hashes with the latest build, and also provide a file with the magic numbers and versions for each.

Green-Sky · 2023-03-22T13:14:12Z

the new ggml file format has the version number 1. calling it ggml2 or "v2" is going to cause confusion. the new file format switched the file magic from "ggml" to "ggmf", maybe we should lean into that.

anzz1 · 2023-03-23T07:34:03Z

Some checksums (q4_0 and gptq-4b quantizations, new tokenizer format)

ggml-q4-checksums.zip

e: added more checksums

Delete this for now to avoid confusion since it contains some wrong checksums from the old tokenizer format Re-add after #374 is resolved

gjmulder · 2023-03-23T14:32:18Z

Some checksums (q4_0 quantization, new tokenizer format)

ggml-q4_0-checksums.zip

I'd trust your checksums for the alpaca models over mine.

$ cat SHA256SUMS.gary
alpaca-13B-ggml/ggml-model-q4_0.bin: FAILED
alpaca-13B-ggml/params.json: FAILED open or read
alpaca-13B-ggml/tokenizer.model: FAILED open or read
alpaca-30B-ggml/ggml-model-q4_0.bin: OK
alpaca-30B-ggml/params.json: OK
alpaca-30B-ggml/tokenizer.model: FAILED open or read
alpaca-7B-ggml/ggml-model-q4_0.bin: FAILED
alpaca-7B-ggml/params.json: FAILED open or read
alpaca-7B-ggml/tokenizer.model: FAILED open or read
llama-13B-ggml/ggml-model-q4_0.bin: OK
llama-13B-ggml/ggml-model-q4_0.bin.1: OK
llama-13B-ggml/params.json: OK
llama-13B-ggml/tokenizer.model: FAILED open or read
llama-30B-ggml/ggml-model-q4_0.bin: OK
llama-30B-ggml/ggml-model-q4_0.bin.1: OK
llama-30B-ggml/ggml-model-q4_0.bin.2: OK
llama-30B-ggml/ggml-model-q4_0.bin.3: OK
llama-30B-ggml/params.json: OK
llama-30B-ggml/tokenizer.model: FAILED open or read
llama-65B-ggml/ggml-model-q4_0.bin: OK
llama-65B-ggml/ggml-model-q4_0.bin.1: OK
llama-65B-ggml/ggml-model-q4_0.bin.2: OK
llama-65B-ggml/ggml-model-q4_0.bin.3: OK
llama-65B-ggml/ggml-model-q4_0.bin.4: OK
llama-65B-ggml/ggml-model-q4_0.bin.5: OK
llama-65B-ggml/ggml-model-q4_0.bin.6: OK
llama-65B-ggml/ggml-model-q4_0.bin.7: OK
llama-65B-ggml/params.json: OK
llama-65B-ggml/tokenizer.model: FAILED open or read
llama-7B-ggml/ggml-model-q4_0.bin: OK
llama-7B-ggml/params.json: OK
llama-7B-ggml/tokenizer.model: FAILED open or read

Green-Sky · 2023-03-23T14:52:27Z

the problem with the alpaca models is, that there are alot of different once, by different peoples.

gjmulder · 2023-03-23T15:05:22Z

the problem with the alpaca models is, that there are alot of different once, by different peoples.

Yes. However we're supporting them, so we need to decide what we can support.

gjmulder · 2023-03-23T15:18:23Z

Upvote for @anzz1's new naming convention for the various model subdirs.

Green-Sky · 2023-03-23T15:41:59Z

@anzz1 why is the tokenizer.model duplicated everywhere, afaik there is only 1

anzz1 · 2023-03-23T16:12:35Z

@Green-Sky Yeah there is only one, i might be thinking ahead too much. 😄

also added some more checksums for gptq-4b models above #374 (comment)

Green-Sky · 2023-03-23T16:31:15Z

IMHO, I think we should move the alpaca checksums to a discussion, with a thread for each indiviual model, with source and credits and converted checksums.
I don't think we can tame the diverse 🦙 hoard otherwise.

gjmulder · 2023-03-23T17:13:38Z

How about an individual SHA256SUMS.model_type file per model type?

That way we have some granularity and it is self-documenting for new users who don't know a llama from an alpaca.

anzz1 · 2023-03-23T17:44:57Z

yes it might be good to differentiate ones as some have short fur and some long and some are more friendly than others.
but llamas will always be the llamas and alpacas will be many. llamas are stable, but alpacas are wild cards. i don't see much value in documenting a million different alpaca variations, there should be a standard set to test against but otherwise no point in trying to document every grain of sand at the beach

1 "standard" sum per 1 model type seems to make the most sense. i cant see why they would need to be their own files though, as i'm not big fan of the idea of littering a repo with dozens of files when the same thing can be achieved with dozens of lines in a single file.

i agree this should be moved to discussions as it will be a ongoing thing

anzz1 mentioned this issue Mar 22, 2023

Alpaca 7B faults on both macOS arm64 and Linux ppc64le #379

Closed

sw mentioned this issue Mar 22, 2023

Deduplicate q4 quantization functions #383

Merged

gjmulder added bug Something isn't working model Model specific labels Mar 22, 2023

anzz1 added a commit that referenced this issue Mar 23, 2023

Delete SHA256SUMS for now

10526e8

Delete this for now to avoid confusion since it contains some wrong checksums from the old tokenizer format Re-add after #374 is resolved

anzz1 mentioned this issue Mar 23, 2023

Delete SHA256SUMS for now #416

Merged

sw pushed a commit that referenced this issue Mar 23, 2023

Delete SHA256SUMS for now (#416)

8eea5ae

Delete this for now to avoid confusion since it contains some wrong checksums from the old tokenizer format Re-add after #374 is resolved

anzz1 mentioned this issue Mar 23, 2023

Add proper instructions for using Alpaca models #382

Closed

Repository owner locked and limited conversation to collaborators Mar 23, 2023

anzz1 converted this issue into discussion #433 Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

SHA256 checksums correctness #374

SHA256 checksums correctness #374

anzz1 commented Mar 21, 2023 •

edited

Loading

gjmulder commented Mar 22, 2023

Green-Sky commented Mar 22, 2023

anzz1 commented Mar 23, 2023 •

edited

Loading

gjmulder commented Mar 23, 2023

Green-Sky commented Mar 23, 2023

gjmulder commented Mar 23, 2023

gjmulder commented Mar 23, 2023

Green-Sky commented Mar 23, 2023

anzz1 commented Mar 23, 2023

Green-Sky commented Mar 23, 2023

gjmulder commented Mar 23, 2023

anzz1 commented Mar 23, 2023 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

SHA256 checksums correctness #374

SHA256 checksums correctness #374

Comments

anzz1 commented Mar 21, 2023 • edited Loading

gjmulder commented Mar 22, 2023

Green-Sky commented Mar 22, 2023

anzz1 commented Mar 23, 2023 • edited Loading

gjmulder commented Mar 23, 2023

Green-Sky commented Mar 23, 2023

gjmulder commented Mar 23, 2023

gjmulder commented Mar 23, 2023

Green-Sky commented Mar 23, 2023

anzz1 commented Mar 23, 2023

Green-Sky commented Mar 23, 2023

gjmulder commented Mar 23, 2023

anzz1 commented Mar 23, 2023 • edited Loading

This issue was moved to a discussion.

anzz1 commented Mar 21, 2023 •

edited

Loading

anzz1 commented Mar 23, 2023 •

edited

Loading

anzz1 commented Mar 23, 2023 •

edited

Loading