Merge branch 'main' of github.com:rustformers/llama-rs into feature/p…

…th-to-ggml
rustformers · Apr 6, 2023 · dcc69c4 · dcc69c4
2 parents cd489a2 + eea9fc7
commit dcc69c4
Show file tree

Hide file tree

Showing 4 changed files with 53 additions and 11 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -14,7 +14,7 @@ follows:
   from inside the `CREDITS.md` file.
 - Run the bindgen script:
     ```shell
-    $ cargo run --bin generate_ggml_bindings ggml-sys
+    $ cargo run --bin generate-ggml-bindings ggml-sys
     ```
 - Fix any compiler errors that pop up due to the new version of the bindings and
   test the changes.
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,21 @@
+# Start with a rust alpine image
+FROM rust:alpine3.17 as builder
+# This is important, see https://github.com/rust-lang/docker-rust/issues/85
+ENV RUSTFLAGS="-C target-feature=-crt-static"
+# if needed, add additional dependencies here
+RUN apk add --no-cache musl-dev
+# set the workdir and copy the source into it
+WORKDIR /app
+COPY ./ /app
+# do a release build
+RUN cargo build --release --bin llama-cli
+RUN strip target/release/llama-cli
+
+# use a plain alpine image, the alpine version needs to match the builder
+FROM alpine:3.17
+# if needed, install additional dependencies here
+RUN apk add --no-cache libgcc
+# copy the binary into the final image
+COPY --from=builder /app/target/release/llama-cli .
+# set the binary as entrypoint
+ENTRYPOINT ["/llama-cli"]
diff --git a/README.md b/README.md
@@ -80,13 +80,12 @@ kinds of sources.
 After acquiring the weights, it is necessary to convert them into a format that
 is compatible with ggml. To achieve this, follow the steps outlined below:
 
-> **Warning** 
-> 
+> **Warning**
+>
 > To run the Python scripts, a Python version of 3.9 or 3.10 is required. 3.11
 > is unsupported at the time of writing.
 
-
-``` shell
+```shell
 # Convert the model to f16 ggml format
 python3 scripts/convert-pth-to-ggml.py /path/to/your/models/7B/ 1
 
@@ -95,7 +94,7 @@ python3 scripts/convert-pth-to-ggml.py /path/to/your/models/7B/ 1
 ```
 
 > **Note**
-> 
+>
 > The [llama.cpp repository](https://github.com/ggerganov/llama.cpp) has
 > additional information on how to obtain and run specific models. With some
 > caveats:
@@ -104,17 +103,15 @@ python3 scripts/convert-pth-to-ggml.py /path/to/your/models/7B/ 1
 > (versioned) ggml formats, but not the mmap-ready version that was [recently
 > merged](https://github.com/ggerganov/llama.cpp/pull/613).
 
-
-*Support for other open source models is currently planned. For models where
+_Support for other open source models is currently planned. For models where
 weights can be legally distributed, this section will be updated with scripts to
 make the install process as user-friendly as possible. Due to the model's legal
 requirements, this is currently not possible with LLaMA itself and a more
-lengthy setup is required.*
+lengthy setup is required._
 
 - https://github.com/rustformers/llama-rs/pull/85
 - https://github.com/rustformers/llama-rs/issues/75
 
-
 ### Running
 
 For example, try the following prompt:
@@ -147,6 +144,19 @@ Some additional things to try:
     A modern-ish C toolchain is required to compile `ggml`. A C++ toolchain
     should not be necessary.
 
+### Docker
+
+```shell
+# To build (This will take some time, go grab some coffee):
+docker build -t llama-rs .
+
+# To run with prompt:
+docker run --rm --name llama-rs -it -v ${PWD}/data:/data -v ${PWD}/examples:/examples llama-rs -m data/gpt4all-lora-quantized-ggml.bin -p "Tell me how cool the Rust programming language is:"
+
+# To run with prompt file and repl (will wait for user input):
+docker run --rm --name llama-rs -it -v ${PWD}/data:/data -v ${PWD}/examples:/examples llama-rs -m data/gpt4all-lora-quantized-ggml.bin -f examples/alpaca_prompt.txt --repl
+```
+
 ## Q&A
 
 ### Why did you do this?

diff --git a/ggml-sys/build.rs b/ggml-sys/build.rs
@@ -56,7 +56,18 @@ fn main() {
         }
         "aarch64" => {
             if compiler.is_like_clang() || compiler.is_like_gnu() {
-                build.flag("-mcpu=native");
+                if std::env::var("HOST") == std::env::var("TARGET") {
+                    build.flag("-mcpu=native");
+                } else {
+                    #[allow(clippy::single_match)]
+                    match target_os.as_str() {
+                        "macos" => {
+                            build.flag("-mcpu=apple-m1");
+                            build.flag("-mfpu=neon");
+                        }
+                        _ => {}
+                    }
+                }
                 build.flag("-pthread");
             }
         }