Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Structural Overhaul #162

Merged
merged 40 commits into from
Apr 30, 2023
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
d84aa7f
Create a Model trait
danforbes Apr 15, 2023
e0713a1
Bloom model
danforbes Apr 15, 2023
6bfda75
cargo fmt
danforbes Apr 16, 2023
73f59c3
Rename llama-rs to llm-base
danforbes Apr 16, 2023
e670c25
Clippy
danforbes Apr 16, 2023
c4b4176
Remove redundant associated Model type from Model trait
danforbes Apr 16, 2023
1cf305f
Remove associated Layer type from Model trait
danforbes Apr 16, 2023
0d4dde9
cargo fmt
danforbes Apr 16, 2023
849c28d
Docs
danforbes Apr 16, 2023
54ad890
Tests and examples
danforbes Apr 16, 2023
4ba7c1c
Layers are private
danforbes Apr 16, 2023
dcf85ff
Merge branch 'main' of github.com:rustformers/llama-rs into dfo/model…
philpax Apr 22, 2023
43ecac1
Merge branch 'main' into dfo/model/bloom
philpax Apr 25, 2023
440bd69
Fix build
philpax Apr 25, 2023
5658484
refactor: introduce llm(-cli)
philpax Apr 25, 2023
bcf5627
Fix model name in LLaMA inference example
danforbes Apr 26, 2023
5ac4b79
feat: wire up both bloom/llama to CLI
philpax Apr 26, 2023
1601240
Merge branch 'dfo/model/bloom' of github.com:danforbes/llama-rs into …
philpax Apr 26, 2023
1761512
Add example for testing BLOOM inference
danforbes Apr 26, 2023
8d2d9c6
cargo fmt
danforbes Apr 26, 2023
813bdd1
Add launch.json for debugging loading and inference
danforbes Apr 26, 2023
c608b4b
Merge branch 'main' into dfo/model/bloom
danforbes Apr 27, 2023
e19418c
Check tensor dimensions when loading
danforbes Apr 27, 2023
e35f93b
`Model` -> `KnownModel`, `ErasedModel -> Model`
danforbes Apr 27, 2023
288df7f
Merge branch 'main' into dfo/model/bloom
danforbes Apr 29, 2023
0aea8f7
Refactor ggml stuff into a single crate
danforbes Apr 27, 2023
8594ac8
Use latest upstream ggml with alibi
danforbes Apr 28, 2023
a542c98
Improve examples
danforbes Apr 28, 2023
16fca15
Latest upstream ggml
danforbes Apr 28, 2023
974d2f7
Cleanup README
danforbes Apr 28, 2023
1abaa41
Rebase fix
danforbes Apr 29, 2023
f994fa8
GPT2/Cerebras loading and inference
danforbes Apr 26, 2023
ff99a80
Rebase & remove BLOOM
danforbes Apr 30, 2023
454f3a9
GitHub Action should support Git submodules
danforbes Apr 30, 2023
e69d487
Fix binary file name in README
danforbes Apr 30, 2023
608090b
ggml-rs -> ggml
danforbes Apr 30, 2023
78db42c
Add back BLOOM
danforbes Apr 30, 2023
1eb2e11
feat: re-enable BLOOM for now
philpax Apr 30, 2023
181d823
refactor: reintroduce ggml-sys and bindgen tool
philpax Apr 30, 2023
9314c68
fix: check out submodules for clippy CI
philpax Apr 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "ggml-rs/ggml"]
path = ggml-rs/ggml
url = git@github.com:ggerganov/ggml.git
44 changes: 44 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"type": "lldb",
"request": "launch",
"name": "Debug example 'gpt2_inference'",
"cargo": {
"args": [
"build",
"--example=gpt2_inference",
"--package=gpt2"
],
"filter": {
"name": "gpt2_inference",
"kind": "example"
}
},
"args": ["${env:HOME}/.ggml-models/cerebras-gpt-13b.bin"],
"cwd": "${workspaceFolder}"
},
{
"type": "lldb",
"request": "launch",
"name": "Debug example 'llama_inference'",
"cargo": {
"args": [
"build",
"--example=llama_inference",
"--package=llama"
],
"filter": {
"name": "llama_inference",
"kind": "example"
}
},
"args": ["${env:HOME}/.ggml-models/gpt4all-7b.bin"],
"cwd": "${workspaceFolder}"
}
]
}
99 changes: 57 additions & 42 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 10 additions & 6 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
[workspace]
members = [
"ggml-sys",
"ggml",
"ggml-format",
"llama-rs",
"llama-cli",
"generate-ggml-bindings"
# Crates
"ggml-rs",
"llm-base",
"gpt2",
"llama",
"llm",
"llm-cli",
]
resolver = "2"

[workspace.package]
version = "0.1.0"

[workspace.dependencies]
bytemuck = "1.13.1"
log = "0.4"
rand = "0.8.5"
serde = { version = "1.0", features = ["derive"] }
69 changes: 36 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,31 @@
# LLaMA-rs

<!-- markdownlint-disable-file MD026 -->
This project is a Rust port of
[llama.cpp](https://github.com/ggerganov/llama.cpp) 🦙🦀🚀
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we'll remove this wording as we grow to accommodate more LLMs. I'll revise the wording on this after this PR lands, so nothing for you to do here - just mentioning it.


> Do the LLaMA thing, but now in Rust 🦀🚀🦙

![A llama riding a crab, AI-generated](./doc/resources/logo2.png)

> _Image by [@darthdeus](https://github.com/darthdeus/), using Stable Diffusion_

[![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/F1F8DNO5D)
Just like its C++ counterpart, it is powered by the
[`ggml`](https://github.com/ggerganov/ggml) tensor library, which allows running
inference for Facebook's [LLaMA](https://github.com/facebookresearch/llama)
model on a CPU with good performance using full precision, f16 or 4-bit
quantized versions of the model.

[![Latest version](https://img.shields.io/crates/v/llama-rs.svg)](https://crates.io/crates/llama_rs)
![MIT/Apache2](https://shields.io/badge/license-MIT%2FApache--2.0-blue)
[![Discord](https://img.shields.io/discord/1085885067601137734)](https://discord.gg/YB9WaXYAWU)

![Gif showcasing language generation using llama-rs](./doc/resources/llama_gif.gif)

**LLaMA-rs** is a Rust port of the
[llama.cpp](https://github.com/ggerganov/llama.cpp) project. This allows running
inference for Facebook's [LLaMA](https://github.com/facebookresearch/llama)
model on a CPU with good performance using full precision, f16 or 4-bit
quantized versions of the model.
![A llama riding a crab, AI-generated](./doc/resources/logo2.png)

Just like its C++ counterpart, it is powered by the
[`ggml`](https://github.com/ggerganov/ggml) tensor library, achieving the same
performance as the original code.
> _Image by [@darthdeus](https://github.com/darthdeus/), using Stable Diffusion_

## Getting started

Make sure you have a Rust 1.65.0 or above and C toolchain[^1] set up.

`llama-rs` is a Rust library, while `llama-cli` is a CLI application that wraps
`llama-rs` and offers basic inference capabilities.
`llm-base`, `gpt2`, and `llama` are Rust libraries, while `llm-cli` is a CLI
applications that wraps `gpt2` and `llama` and offer basic inference
capabilities.

The following instructions explain how to build `llama-cli`.
The following instructions explain how to build CLI applications.

**NOTE**: For best results, make sure to build and run in release mode.
Debug builds are going to be very slow.
Expand All @@ -43,41 +35,45 @@ Debug builds are going to be very slow.
Run

```shell
cargo install --git https://github.com/rustformers/llama-rs llama-cli
cargo install --git https://github.com/rustformers/llama-rs llm-cli
```

to install `llama-cli` to your Cargo `bin` directory, which `rustup` is likely to
to install `llm-cli` to your Cargo `bin` directory, which `rustup` is likely to
have added to your `PATH`.

It can then be run through `llama-cli`.
The CLI application can then be run through `llm-cli`.

![Gif showcasing language generation using llama-rs](./doc/resources/llama_gif.gif)

### Building from repository

Clone the repository, and then build it through
Clone the repository and then build it with

```shell
cargo build --release --bin llama-cli
git clone --recurse-submodules git@github.com:rustformers/llama-rs.git
cargo build --release
```

The resulting binary will be at `target/release/llama-cli[.exe]`.
The resulting binary will be at `target/release/llm-cli[.exe]`.
danforbes marked this conversation as resolved.
Show resolved Hide resolved

It can also be run directly through Cargo, using

```shell
cargo run --release --bin llama-cli -- <ARGS>
cargo run --release --bin llm-cli -- <ARGS>
```

This is useful for development.

### Getting the weights
### Getting LLaMA weights

In order to run the inference code in `llama-rs`, a copy of the model's weights
are required.

#### From Hugging Face

Compatible weights - not necessarily the original LLaMA weights - can be found
on [Hugging Face by searching for GGML](https://huggingface.co/models?search=ggml). At present, LLaMA-architecture models are supported.
on [Hugging Face by searching for GGML](https://huggingface.co/models?search=ggml).
At present, LLaMA-architecture models are supported.

#### LLaMA original weights

Expand Down Expand Up @@ -107,6 +103,13 @@ cargo run -p llama-cli quantize /path/to/your/models/7B/ggml-model-f16.bin /path
> The [llama.cpp repository](https://github.com/ggerganov/llama.cpp) has
> additional information on how to obtain and run specific models.

### GPT2

OpenAI's [GPT-2](https://jalammar.github.io/illustrated-gpt2/) architecture is
also supported. The open-source family of
[Cerebras](https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/)
models is built on this architecture.

_Support for other open source models is currently planned. For models where
weights can be legally distributed, this section will be updated with scripts to
make the install process as user-friendly as possible. Due to the model's legal
Expand All @@ -133,9 +136,9 @@ Some additional things to try:

![Gif showcasing alpaca repl mode](./doc/resources/alpaca_repl_screencap.gif)

- Sessions can be loaded (`--load-session`) or saved (`--save-session`) to file. To automatically load
and save the same session, use `--persist-session`. This can be used to cache prompts to reduce load
time, too:
- Sessions can be loaded (`--load-session`) or saved (`--save-session`) to file.
To automatically load and save the same session, use `--persist-session`.
This can be used to cache prompts to reduce load time, too:

![Gif showcasing prompt caching](./doc/resources/prompt_caching_screencap.gif)

Expand Down
9 changes: 0 additions & 9 deletions generate-ggml-bindings/Cargo.toml

This file was deleted.

Loading