Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama-cpp: pull upstream flake changes #289513

Merged
merged 1 commit into from
Feb 29, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 96 additions & 87 deletions pkgs/by-name/ll/llama-cpp/package.nix
happysalada marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're removing the /lib and /include folders

I didn't notice that, where? Libraries and headers are installed by CMake

Copy link
Member

@mschwaig mschwaig Feb 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built it locally and went by the output of tree to check. Embarrassingly I read that wrong. I will update my comment.

Actually the previous version commit only has /bin, and additionally /lib for libggml_shared.so and libllama.so with .override{ static=false; }.
This PR gets rid of the flag and has some extra stuff in liband adds a /include:

result
├── bin
│   ├── baby-llama
│   ├── batched
│   ├── batched-bench
│   ├── beam-search
│   ├── benchmark
│   ├── convert-llama2c-to-ggml
│   ├── convert-lora-to-ggml.py
│   ├── convert.py
│   ├── embedding
│   ├── export-lora
│   ├── finetune
│   ├── gguf
│   ├── imatrix
│   ├── infill
│   ├── llama
│   ├── llama-bench
│   ├── llama-server
│   ├── llava-cli
│   ├── lookahead
│   ├── lookup
│   ├── parallel
│   ├── passkey
│   ├── perplexity
│   ├── quantize
│   ├── quantize-stats
│   ├── save-load-state
│   ├── simple
│   ├── speculative
│   ├── test-autorelease
│   ├── test-backend-ops
│   ├── test-grad0
│   ├── test-grammar-parser
│   ├── test-llama-grammar
│   ├── test-model-load-cancel
│   ├── test-quantize-fns
│   ├── test-quantize-perf
│   ├── test-rope
│   ├── test-sampling
│   ├── test-tokenizer-0-falcon
│   ├── test-tokenizer-0-llama
│   ├── test-tokenizer-1-bpe
│   ├── test-tokenizer-1-llama
│   ├── tokenize
│   └── train-text-from-scratch
├── include
│   ├── ggml-alloc.h
│   ├── ggml-backend.h
│   ├── ggml.h
│   └── llama.h
└── lib
    ├── cmake
    │   └── Llama
    │       ├── LlamaConfig.cmake
    │       └── LlamaConfigVersion.cmake
    ├── libggml_shared.so
    ├── libllama.so
    └── libllava_shared.so

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I only checked the upstream flake yet. I'll have a look at both of the PRs during the week

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other side of that issue of renaming binaries is that with whats currently on master it's actually really annoying to switch between something like ollama with llama.cpp from the upstream flake and nixpkgs, because of the different binary names

That's an interesting thought 👍🏻 . Since we in principle can contribute both to the nixpkgs and to the upstream nix expressions, we could expose all of the relevant metadata in the the passthru (or even in the outputs if needed) to accommodate smooth transition

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the package on their path, since it contains binaries with names like lookup and simple

To ensure this doesn't bite us in the future, we could propose a PR upstream adding a cmake option that prefixes all of the installed binaries (the default prefix being the empty string)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm maybe not following, but why integrating the Vulkan PR should be a blocker for merging this? I wouldn't mind if this was a separate PR, depending on whether it's easier to pull this together with or without the Vulkan enhancement.

@mschwaig's concerns need to be resolved indeed. Btw, @happysalada do you still have time for this?

Original file line number Diff line number Diff line change
Expand Up @@ -15,124 +15,132 @@
, openclSupport ? false
, clblast

, blasSupport ? !rocmSupport && !cudaSupport
, openblas
, blasSupport ? builtins.all (x: !x) [ cudaSupport metalSupport openclSupport rocmSupport vulkanSupport ]
, pkg-config
, metalSupport ? stdenv.isDarwin && stdenv.isAarch64 && !openclSupport
, patchelf
, static ? true # if false will build the shared objects as well
, vulkanSupport ? false
, mpiSupport ? false # Increases the runtime closure by ~700M
, vulkan-headers
, vulkan-loader
, ninja
, git
, mpi
}:

let
# It's necessary to consistently use backendStdenv when building with CUDA support,
# otherwise we get libstdc++ errors downstream.
# cuda imposes an upper bound on the gcc version, e.g. the latest gcc compatible with cudaPackages_11 is gcc11
effectiveStdenv = if cudaSupport then cudaPackages.backendStdenv else stdenv;
inherit (lib) cmakeBool cmakeFeature optionals;

darwinBuildInputs =
with darwin.apple_sdk.frameworks;
[
Accelerate
CoreVideo
CoreGraphics
]
++ optionals metalSupport [ MetalKit ];

cudaBuildInputs = with cudaPackages; [
cuda_cccl.dev # <nv/target>

# A temporary hack for reducing the closure size, remove once cudaPackages
# have stopped using lndir: https://github.com/NixOS/nixpkgs/issues/271792
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's on the Roadmap 🔮!

cuda_cudart.dev
cuda_cudart.lib
cuda_cudart.static
libcublas.dev
libcublas.lib
libcublas.static
];

rocmBuildInputs = with rocmPackages; [
clr
hipblas
rocblas
];

vulkanBuildInputs = [
vulkan-headers
vulkan-loader
];
in
effectiveStdenv.mkDerivation (finalAttrs: {
pname = "llama-cpp";
version = "2249";
version = "2294";

src = fetchFromGitHub {
owner = "ggerganov";
repo = "llama.cpp";
rev = "refs/tags/b${finalAttrs.version}";
hash = "sha256-ikJUToUbA60u/8azR6dPmPyodq/nQe5L2aotlYBclaE=";
hash = "sha256-uZi4Bj03PgfFV+jS5M+A1sMCWC/GMY5IyyrlR1b4Sh4=";
};

postPatch = ''
substituteInPlace ./ggml-metal.m \
--replace '[bundle pathForResource:@"ggml-metal" ofType:@"metal"];' "@\"$out/bin/ggml-metal.metal\";"
'';
Comment on lines 81 to 84
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to your change: this is such an odd line and I wonder why the Nix build needs it. I'm no Metal-head, though.


nativeBuildInputs = [ cmake ] ++ lib.optionals blasSupport [ pkg-config ] ++ lib.optionals cudaSupport [
nativeBuildInputs = [ cmake ninja pkg-config git ]
++ optionals cudaSupport [
cudaPackages.cuda_nvcc

# TODO: Replace with autoAddDriverRunpath
# once https://github.com/NixOS/nixpkgs/pull/275241 has been merged
cudaPackages.autoAddOpenGLRunpathHook
];

buildInputs = lib.optionals effectiveStdenv.isDarwin
(with darwin.apple_sdk.frameworks; [
Accelerate
CoreGraphics
CoreVideo
Foundation
])
++ lib.optionals metalSupport (with darwin.apple_sdk.frameworks; [
MetalKit
])
++ lib.optionals cudaSupport (with cudaPackages; [
cuda_cccl.dev # <nv/target>

# A temporary hack for reducing the closure size, remove once cudaPackages
# have stopped using lndir: https://github.com/NixOS/nixpkgs/issues/271792
cuda_cudart.dev
cuda_cudart.lib
cuda_cudart.static
libcublas.dev
libcublas.lib
libcublas.static
]) ++ lib.optionals rocmSupport [
rocmPackages.clr
rocmPackages.hipblas
rocmPackages.rocblas
] ++ lib.optionals openclSupport [
clblast
] ++ lib.optionals blasSupport [
openblas
];
buildInputs = optionals effectiveStdenv.isDarwin darwinBuildInputs
++ optionals cudaSupport cudaBuildInputs
++ optionals mpiSupport mpi
++ optionals openclSupport [ clblast ]
++ optionals rocmSupport rocmBuildInputs
++ optionals vulkanSupport vulkanBuildInputs;
Comment on lines +95 to +100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gorgeous.


cmakeFlags = [
"-DLLAMA_NATIVE=OFF"
"-DLLAMA_BUILD_SERVER=ON"
# -march=native is non-deterministic; override with platform-specific flags if needed
(cmakeBool "LLAMA_NATIVE" false)
(cmakeBool "BUILD_SHARED_SERVER" true)
(cmakeBool "BUILD_SHARED_LIBS" true)
(cmakeBool "BUILD_SHARED_LIBS" true)
(cmakeBool "LLAMA_BLAS" blasSupport)
(cmakeBool "LLAMA_CLBLAST" openclSupport)
(cmakeBool "LLAMA_CUBLAS" cudaSupport)
(cmakeBool "LLAMA_HIPBLAS" rocmSupport)
(cmakeBool "LLAMA_METAL" metalSupport)
(cmakeBool "LLAMA_MPI" mpiSupport)
(cmakeBool "LLAMA_VULKAN" vulkanSupport)
]
++ lib.optionals metalSupport [
"-DCMAKE_C_FLAGS=-D__ARM_FEATURE_DOTPROD=1"
"-DLLAMA_METAL=ON"
]
++ lib.optionals cudaSupport [
"-DLLAMA_CUBLAS=ON"
]
++ lib.optionals rocmSupport [
"-DLLAMA_HIPBLAS=1"
"-DCMAKE_C_COMPILER=hipcc"
"-DCMAKE_CXX_COMPILER=hipcc"
"-DCMAKE_POSITION_INDEPENDENT_CODE=ON"
]
++ lib.optionals openclSupport [
"-DLLAMA_CLBLAST=ON"
]
++ lib.optionals blasSupport [
"-DLLAMA_BLAS=ON"
"-DLLAMA_BLAS_VENDOR=OpenBLAS"
]
++ lib.optionals (!static) [
(lib.cmakeBool "BUILD_SHARED_LIBS" true)
];

installPhase = ''
runHook preInstall

mkdir -p $out/bin
${lib.optionalString (!static) ''
mkdir $out/lib
cp libggml_shared.so $out/lib
cp libllama.so $out/lib
''}

for f in bin/*; do
test -x "$f" || continue
${lib.optionalString (!static) ''
${patchelf}/bin/patchelf "$f" --set-rpath "$out/lib"
''}
cp "$f" $out/bin/llama-cpp-"$(basename "$f")"
done

${lib.optionalString metalSupport "cp ./bin/ggml-metal.metal $out/bin/ggml-metal.metal"}

runHook postInstall
++ optionals cudaSupport [
(
with cudaPackages.flags;
cmakeFeature "CMAKE_CUDA_ARCHITECTURES" (
builtins.concatStringsSep ";" (map dropDot cudaCapabilities)
)
)
]
++ optionals rocmSupport [
(cmakeFeature "CMAKE_C_COMPILER" "hipcc")
(cmakeFeature "CMAKE_CXX_COMPILER" "hipcc")

# Build all targets supported by rocBLAS. When updating search for TARGET_LIST_ROCM
# in https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/CMakeLists.txt
# and select the line that matches the current nixpkgs version of rocBLAS.
# Should likely use `rocmPackages.clr.gpuTargets`.
"-DAMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102"
Comment on lines +131 to +132
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is from the upstream flake, I guess it's ok to just "pull those changes", but consider following up on the suggestion from the comments in a separate commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the xnack- present on some of these would make it difficult to abstract away
Happy to follow up with a separate PR when this has been resolved exactly and tested.

]
++ optionals metalSupport [ (cmakeFeature "CMAKE_C_FLAGS" "-D__ARM_FEATURE_DOTPROD=1") ]
++ optionals blasSupport [ (cmakeFeature "LLAMA_BLAS_VENDOR" "OpenBLAS") ];

# upstream plans on adding targets at the cmakelevel, remove those
# additional steps after that
postInstall = ''
mv $out/bin/main $out/bin/llama
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this breaks meta.mainProgram

Copy link
Contributor

@SomeoneSerge SomeoneSerge Feb 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really annoying to switch between something like ollama with llama.cpp from the upstream flake and nixpkgs, because of the different binary names

To be fair, it was the upstream's choice to use non-composable/generic names (these won't do for nixpkgs, imo). 👍🏻 that the name should be consistent with whatever meta.mainProgram is kept

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

Suggested change
mv $out/bin/main $out/bin/llama
mv $out/bin/main $out/bin/${finalAttrs.finalPackage.meta.mainProgram}

mv $out/bin/server $out/bin/llama-server
mkdir -p $out/include
cp $src/llama.h $out/include/
'';

passthru.updateScript = nix-update-script {
Expand All @@ -144,9 +152,10 @@ effectiveStdenv.mkDerivation (finalAttrs: {
description = "Port of Facebook's LLaMA model in C/C++";
homepage = "https://github.com/ggerganov/llama.cpp/";
license = licenses.mit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description also wants to be updated from the upstream: Inference of LLaMA model in pure C/C++

I like the pnameSuffix and descriptionSuffix things that upstream does as well, which I note you chose not to port over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little lazy, and wasn't 100% sure it was going to be accepted. I wanted to make the smallest change possible. How about we do that in a follow up PR ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have no strong feelings about it being in this PR.

mainProgram = "llama-cpp-main";
maintainers = with maintainers; [ dit7ya elohmeier ];
broken = (effectiveStdenv.isDarwin && effectiveStdenv.isx86_64) || lib.count lib.id [openclSupport blasSupport rocmSupport cudaSupport] == 0;
mainProgram = "llama";
maintainers = with maintainers; [ dit7ya elohmeier philiptaron ];
platforms = platforms.unix;
badPlatforms = optionals (cudaSupport || openclSupport) lib.platforms.darwin;
broken = (metalSupport && !effectiveStdenv.isDarwin);
};
})
Loading