SaveState / LoadState not working on 8-bit quantized gguf models #260

BrainSlugs83 · 2023-11-06T23:09:46Z

Not sure if it's working for other model types, I'm only testing on 8-bit models right now, so it might be a wider bug. (Specifically this happens for me with openchat_3.5.Q8_0.gguf).

I'm using the following parameters:

var parameters = new ModelParams(@"C:\models\openchat_3.5.Q8_0.gguf")
{
    ContextSize = 8 * 1024,
    Seed = 1337,
    GpuLayerCount = 15
};

Calling InteractiveExecutor.SaveState produces a json file with the correct tokens (you can pass them to the tokenizer to see them), among other values.
And then calling InteractiveExecutor.LoadState on a new instance just causes it to spit out random garbled text that is not even coherent sentences.

Same problem happens with GetStateData() and LoadState as well.

Btw, I'm using LLamaSharp 0.51 and Cuda11 backend.

The text was updated successfully, but these errors were encountered:

martindevans · 2023-11-06T23:12:51Z

Could you try this with the newer 0.7.0 release, to confirm if it's still an issue? Thanks.

BrainSlugs83 · 2023-11-07T00:38:12Z

Yes it still reproduces in 0.7.0. But also, the inferencing is about 10x slower on my machine than in 0.5.1 with the Cuda11 backend. (while using 0.7.0 on both LlamaSharp and LlamaSharp.Backend.Cuda).

martindevans · 2023-11-07T01:20:19Z

The speed problem is a known issue, fortunately we already have a fix merged into master for that! We'll be making a release soon I expect.

For the state thing I'll have a look into it. There's been a huge change in the llama.cpp internals which has probably broken state handling somehow.

AsakusaRinne · 2023-11-24T19:05:21Z

Hi I'm working on this issue but I cannot reproduce it with openchat_3.5.Q8_0.gguf. Could you please provide a piece of code and some tips to reproduce it with master branch? Note that the model may have been updated since you opened this issue, please update the model: https://huggingface.co/TheBloke/openchat_3.5-GGUF/blob/main/openchat_3.5.Q8_0.gguf

BrainSlugs83 · 2023-11-30T20:56:30Z

I'll attempt to repro this tonight if I can.

BrainSlugs83 · 2023-12-01T05:48:16Z

Looks like its working for saving to and from a file. But not for GetState() / LoadState() it seems some parameter in there is not getting updated during the load state, and so old memory hangs around.

AsakusaRinne · 2023-12-01T09:46:27Z

Thanks, I'll try to reproduce it with GetState() / LoadState(). :)

BrainSlugs83 · 2023-12-01T16:57:02Z

Actually, I spoke to soon...

I had an hour or two long conversation using neural-chat-7b-v3-1.Q4_K_M.gguf (with 4k context and InteractiveExecutor)... I maxed out the tokens probably a half hour in but it stayed coherent (is it just using a ring buffer?)

But when I tried to load a save file from earlier in the session using LoadState... well, it stayed coherent... but it still had all the recent conversation in memory. -- So that seems like a fail to me.

I would expect each state to be self contained and to not bleed through and contaminate other states. So that when you load a state, it's the only thing loaded into the model, and it doesn't have any other "memory" in it. -- Is that assumption incorrect? (If so, how can I achieve an isolated behavior?)

For example, if I run the program from scratch and load that state, everything was fine and it only had the conversation up to that point (but loading it later, sometimes left other info in it's memory somehow). Not sure if that makes any sense. It might be a different bug, I'm not sure.

At any rate, this is a huge improvement over previous as it's at least kind of working now... sometimes... but it's still not 100% working IMHO.

AsakusaRinne · 2024-03-09T10:05:09Z

Sorry for this late reply, I didn't notice your message that time.

Is the following case what you mean?

model.Chat("xxx"); // first chat
model.SaveState("state1"); // save the state for some chat histories
model.Chat("xxx"); // second chat
model.SaveState("state2"); // save the state again

model.Load("state2") // You only want the memory during the second chat.

AsakusaRinne · 2024-03-09T10:07:10Z

Besides, the latest version is 0.10.0 now. :)

AsakusaRinne · 2024-05-13T11:33:07Z

Closing this issue as inactive. Please feel free to comment here if the problem still reproduces.

martindevans added the bug Something isn't working label Nov 6, 2023

AsakusaRinne added this to LLamaSharp Dev Nov 9, 2023

AsakusaRinne moved this to 📋 TODO in LLamaSharp Dev Nov 9, 2023

AsakusaRinne mentioned this issue Nov 13, 2023

Roadmap to v1.0.0 #287

Open

8 tasks

AsakusaRinne moved this from 📋 TODO to 🏗 In progress in LLamaSharp Dev Nov 24, 2023

AsakusaRinne moved this from 🏗 In progress to ✅ Done in LLamaSharp Dev May 13, 2024

AsakusaRinne closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SaveState / LoadState not working on 8-bit quantized gguf models #260

SaveState / LoadState not working on 8-bit quantized gguf models #260

BrainSlugs83 commented Nov 6, 2023 •

edited

Loading

martindevans commented Nov 6, 2023

BrainSlugs83 commented Nov 7, 2023

martindevans commented Nov 7, 2023 •

edited

Loading

AsakusaRinne commented Nov 24, 2023

BrainSlugs83 commented Nov 30, 2023

BrainSlugs83 commented Dec 1, 2023

AsakusaRinne commented Dec 1, 2023

BrainSlugs83 commented Dec 1, 2023 •

edited

Loading

AsakusaRinne commented Mar 9, 2024

AsakusaRinne commented Mar 9, 2024

AsakusaRinne commented May 13, 2024

SaveState / LoadState not working on 8-bit quantized gguf models #260

SaveState / LoadState not working on 8-bit quantized gguf models #260

Comments

BrainSlugs83 commented Nov 6, 2023 • edited Loading

martindevans commented Nov 6, 2023

BrainSlugs83 commented Nov 7, 2023

martindevans commented Nov 7, 2023 • edited Loading

AsakusaRinne commented Nov 24, 2023

BrainSlugs83 commented Nov 30, 2023

BrainSlugs83 commented Dec 1, 2023

AsakusaRinne commented Dec 1, 2023

BrainSlugs83 commented Dec 1, 2023 • edited Loading

AsakusaRinne commented Mar 9, 2024

AsakusaRinne commented Mar 9, 2024

AsakusaRinne commented May 13, 2024

BrainSlugs83 commented Nov 6, 2023 •

edited

Loading

martindevans commented Nov 7, 2023 •

edited

Loading

BrainSlugs83 commented Dec 1, 2023 •

edited

Loading