Session caching CLI #41

philpax · 2023-03-18T16:55:01Z

Fixes #38.

I've had to make a few slightly controversial changes here:

InferenceSession now stores all tokens that have been processed, not just the last N tokens
inference_with_prompt now plays back all tokens that have been processed
I've simplified the loop in inference_with_prompt (I realised that the while condition was unnecessary since we can just return the error anyway)

On the plus side, it basically works as you'd expect:

> cargo run --release -- --model-path ggml-alpaca-7b-q4.bin -t 8 -p "This is why I love open-source software: " -n 16 --persist-session test.llama
# [...]
This is why I love open-source software: 58 people in one room, all working together to make a project better.
[2023-03-18T16:51:23Z INFO  llama_cli] Successfully wrote session to "test.llama"

> cargo run --release -- --model-path ggml-alpaca-7b-q4.bin -t 8 -p " Those 58 people can change the world, together, and here's how: " -n 16 --persist-session test.llama
# [...]
[2023-03-18T16:52:01Z INFO  llama_cli] Loaded inference session from "test.llama"
This is why I love open-source software: 58 people in one room, all working together to make a project better. Those 58 people can change the world, together, and here's how: 1) they are not doing it for money (they could be);
[2023-03-18T16:52:07Z INFO  llama_cli] Successfully wrote session to "test.llama"

setzer22

Looks good overall! Just have a couple of comments that I feel are worth discussing

llama-cli/src/cli_args.rs

llama-rs/src/lib.rs

KerfuffleV2 · 2023-03-25T12:47:15Z

I've been messing around with adding compression to this. Weirdly, the natural approach of just wrapping the reader/write in compression functions is unbearably slow. (Including loading/decompressing.) However, compressing to a buffer directly is basically instant.

Something pretty weird is going on.

KerfuffleV2 · 2023-03-25T14:20:25Z

@philpax

Pullception: https://github.com/philpax/llama-rs/pull/1

Commenting here in case you want to keep discussion in one place.

Add zstd compression support for session loading/saving.

setzer22

Looks good 👍 I think this one's ready to merge now

philpax added 2 commits March 18, 2023 16:20

feat(cli): remove once_cell dep

7dfc8a4

feat(cli): implement --*-session args

6bccb2b

setzer22 reviewed Mar 20, 2023

View reviewed changes

llama-cli/src/cli_args.rs Outdated Show resolved Hide resolved

llama-rs/src/lib.rs Show resolved Hide resolved

philpax and others added 5 commits March 25, 2023 15:34

Merge branch 'main' into session-caching-cli

bdb0d4f

fix: only replay tokens for session load

4c010eb

Add zstd compression support for session loading/saving.

700bb93

Just use serde_bytes for serializing memory k/v in sessions.

2f168b7

Merge pull request #1 from KerfuffleV2/feat-zstd_session_caching

796054f

Add zstd compression support for session loading/saving.

philpax requested a review from setzer22 March 25, 2023 16:19

philpax mentioned this pull request Mar 25, 2023

WIP: Refactor Cli #74

Closed

3 tasks

setzer22 approved these changes Mar 26, 2023

View reviewed changes

philpax merged commit 08b875c into rustformers:main Mar 26, 2023

philpax deleted the session-caching-cli branch March 26, 2023 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Session caching CLI #41

Session caching CLI #41

philpax commented Mar 18, 2023

setzer22 left a comment

KerfuffleV2 commented Mar 25, 2023 •

edited

Loading

KerfuffleV2 commented Mar 25, 2023

setzer22 left a comment •

edited

Loading

Session caching CLI #41

Session caching CLI #41

Conversation

philpax commented Mar 18, 2023

setzer22 left a comment

Choose a reason for hiding this comment

KerfuffleV2 commented Mar 25, 2023 • edited Loading

KerfuffleV2 commented Mar 25, 2023

setzer22 left a comment • edited Loading

Choose a reason for hiding this comment

KerfuffleV2 commented Mar 25, 2023 •

edited

Loading

setzer22 left a comment •

edited

Loading