Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koboldcpp.sh - The Mamba Multitool #554

Merged
merged 11 commits into from
Dec 10, 2023

Conversation

henk717
Copy link

@henk717 henk717 commented Dec 9, 2023

Something I am very excited to add, my specialty MicroMamba runtime now for Koboldcpp!

This makes it much simpler for Linux users to get going, complete with 3 modes.
Mode 1: koboldcpp.sh as a substitute for koboldcpp.py, it will ensure all the dependencies are installed (You do need wget, and bzip2 for it to function), automatically compiles all the libs and then launches koboldcpp with its own internal python.
Mode 2: koboldcpp.sh dist : This automatically generates an experimental pyinstaller build, These seem to work across a wide range of Linux distributions if compiled on Ubuntu 18.04 (See my releases). If a user has an incompatible distro chances are it does work if they run the command themselves to generate a binary for their libs.
Mode 3: koboldcpp.sh rebuild : This triggers a recompile as well as an update of the runtime, useful after modifications or git pull's.

root and others added 11 commits December 9, 2023 19:58
I know its technically amd64, but I don't want to cause confusion among nvidia users.
Unsure how this will behave in the pyinstaller build, but pocl ended up CPU only. With a bit of luck the pyinstaller uses the one from the actual system if compiled in a system without opencl, while conda now includes it for that specific system.
Missing this causes compile failures on some system's
Ideally we find a better solution, but conda forces ICD and needs this for the successful compile. However, pyinstaller then embeds the ICD causing it to be limited to the system it was compiled for. By temporarily removing the ICD pyinstaller can't find it and everything remains functional. Ideally we do this on a pyinstaller level, but I could not find any good options to do so yet.
@LostRuins LostRuins changed the base branch from concedo to concedo_experimental December 10, 2023 13:27
Copy link
Owner

@LostRuins LostRuins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merging

@LostRuins LostRuins merged commit e2cf3b7 into LostRuins:concedo_experimental Dec 10, 2023
YellowRoseCx added a commit to YellowRoseCx/koboldcpp-rocm that referenced this pull request Dec 12, 2023
commit 53b5ae02cb1b533b78302422951bcfdeca6e2738
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Tue Dec 12 12:08:29 2023 -0600

    mixtral fan service

commit 168b1d74e26d0321e2e89358303b6c33e8d7d33e
Merge: f13295b de15d4a6
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Tue Dec 12 12:00:52 2023 -0600

    Merge branch 'kcpp-rocm-mixtral2' into main2

commit de15d4a632939a685ec12fa17355298542facf15
Merge: 74acc54 ea4402b
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Tue Dec 12 11:45:19 2023 -0600

    Merge branch 'mixtral' into kcpp-rocm-mixtral

commit ea4402b
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Tue Dec 12 17:03:38 2023 +0200

    test-backend-ops : add one more sum_rows test

commit a51bc0c
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Tue Dec 12 15:55:42 2023 +0200

    metal : fix binary ops for ne10 % 4 != 0

commit 08eb991
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Tue Dec 12 14:14:15 2023 +0200

    metal : add cpy f16 -> f32 kernel

commit a742d9f
Author: slaren <slarengh@gmail.com>
Date:   Tue Dec 12 12:46:33 2023 +0100

    gguf-py : bump version

commit 6a419f4
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Tue Dec 12 13:04:33 2023 +0200

    convert : support safetensors format

commit 74acc54
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Tue Dec 12 10:53:34 2023 +0800

    Revert "Hide hipBLAS (ROCm) if CuBLAS exists - vice versa"

    This reverts commit 4b854d4.

commit f1cbfab
Author: slaren <slarengh@gmail.com>
Date:   Mon Dec 11 20:02:55 2023 +0100

    convert : fix style

commit 7dc75e3
Author: slaren <slarengh@gmail.com>
Date:   Mon Dec 11 20:00:28 2023 +0100

    convert : use 1e6 rope_freq_base for mixtral

commit 296c945
Author: slaren <slarengh@gmail.com>
Date:   Mon Dec 11 16:53:25 2023 +0100

    cuda : fix mul_mat_id with multi gpu

commit 33e50f1
Author: slaren <slarengh@gmail.com>
Date:   Mon Dec 11 12:27:48 2023 +0100

    test-backend-ops : disable MOE test with thread sanitizer

commit ffda94c
Author: slaren <slarengh@gmail.com>
Date:   Mon Dec 11 12:15:31 2023 +0100

    test-backend-ops : simplify and disable slow tests to avoid CI timeout

commit 06581f2
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Mon Dec 11 16:54:42 2023 +0800

    perf endpoint lets you monitor if the embedded horde worker has issues

commit fce971d
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Mon Dec 11 16:17:10 2023 +0800

    do not build the clblast noavx2 binary if not on windows

commit 8cbaed1
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Mon Dec 11 08:55:16 2023 +0200

    llama : fix hard-coded number of experts

commit 4b854d4
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Dec 10 22:49:35 2023 -0600

    Hide hipBLAS (ROCm) if CuBLAS exists - vice versa

commit b002981
Author: slaren <slarengh@gmail.com>
Date:   Mon Dec 11 02:43:52 2023 +0100

    test-backend-ops : fix dequantize block offset

commit f1380d7
Author: slaren <slarengh@gmail.com>
Date:   Sun Dec 10 22:58:31 2023 +0100

    test-backend-ops : add cpy from f32 -> all types test

commit 54d254b
Author: slaren <slarengh@gmail.com>
Date:   Sun Dec 10 21:52:11 2023 +0100

    test-backend-ops : cleanup, add moe test for batches

commit e2cf3b7
Author: henk717 <henk@henk.tech>
Date:   Sun Dec 10 14:30:17 2023 +0100

    koboldcpp.sh - The Mamba Multitool (LostRuins#554)

    * .sh script V1

    * koboldcpp.sh polish

    * koboldcpp.sh dist generator

    * Include html's in dist

    * RWKV in Linux Dist

    * Lower dependency requirements

    * Eliminate wget dependency

    * More distinct binary name

    I know its technically amd64, but I don't want to cause confusion among nvidia users.

    * Use System OpenCL

    Unsure how this will behave in the pyinstaller build, but pocl ended up CPU only. With a bit of luck the pyinstaller uses the one from the actual system if compiled in a system without opencl, while conda now includes it for that specific system.

    * Add cblas dependency

    Missing this causes compile failures on some system's

    * ICD workaround

    Ideally we find a better solution, but conda forces ICD and needs this for the successful compile. However, pyinstaller then embeds the ICD causing it to be limited to the system it was compiled for. By temporarily removing the ICD pyinstaller can't find it and everything remains functional. Ideally we do this on a pyinstaller level, but I could not find any good options to do so yet.

    ---------

    Co-authored-by: root <root@DESKTOP-DQ1QRAG>

commit 54ba263
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Dec 10 15:27:41 2023 +0200

    test-backend-ops : make experts more evenly probable (test_moe)

commit b0b83dd
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Dec 10 14:30:38 2023 +0200

    metal : fix ggml_mul_mat_id for F32

commit 65923a8
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Dec 10 14:17:46 2023 +0200

    convert : determine n_ctx correctly

commit 8614aa7
Author: slaren <slarengh@gmail.com>
Date:   Sun Dec 10 13:12:11 2023 +0100

    cuda : fix get_rows when ncols is odd

commit cefebb3
Author: slaren <slarengh@gmail.com>
Date:   Sun Dec 10 13:11:39 2023 +0100

    test-backend-ops : add moe test

commit e640cbe
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Dec 10 13:57:54 2023 +0200

    llama : add n_expert and n_expert_used to hparams + change quants

commit d1259b7
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Dec 10 13:00:13 2023 +0200

    llama : do not quantize expert gating tensors

commit 6cfb31f
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Dec 10 10:59:13 2023 +0200

    metal : add indirect mat-vec kernels for all quantization types

commit 016f9bb
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Dec 10 09:38:21 2023 +0200

    metal : fix ggml_get_rows to work with non-cont src1

commit 0710b0f
Author: slaren <slarengh@gmail.com>
Date:   Sat Dec 9 23:29:47 2023 +0100

    llama : offload missing ffn_moe_silu

commit 62b95f9
Author: slaren <slarengh@gmail.com>
Date:   Sat Dec 9 22:39:34 2023 +0100

    cuda : support non-contiguous src1 in get_rows

commit 2e4db48
Author: slaren <slarengh@gmail.com>
Date:   Sat Dec 9 22:38:22 2023 +0100

    ggml : update get_rows f16 and q

commit ac3f7d8
Author: slaren <slarengh@gmail.com>
Date:   Sat Dec 9 19:19:03 2023 +0100

    ggml : get_rows : support non-contiguos tensors with gaps, generalize up to 3D

commit 8c5b66e
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Dec 9 15:30:34 2023 +0200

    metal : reduce the kernel launches for ggml_mul_mat_id

commit 7e2006b
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Dec 9 14:24:58 2023 +0200

    metal : add/mul/div use general kernel when src1 not cont

commit 06dfde3
Author: slaren <slarengh@gmail.com>
Date:   Sat Dec 9 13:21:09 2023 +0100

    llama : add basic support for offloading moe with CUDA

commit 2cbcba8
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Dec 9 14:18:42 2023 +0200

    metal : add more general support for ggml_get_rows + tests

commit 9064b1c
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Dec 9 14:04:54 2023 +0200

    ggml : fix ggml_get_rows to take into account ne02 / ne11

commit ee8fb39
Author: slaren <slarengh@gmail.com>
Date:   Sat Dec 9 12:42:25 2023 +0100

    ggml : add n_as argument to ggml_mul_mat_id

commit 7372b62
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Dec 9 13:18:58 2023 +0200

    ggml : ggml_get_rows support 2D indexing [n_tokens, n_experts] (cpu only)

commit 8b185b7
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Dec 9 13:01:42 2023 +0200

    llama : fix expert weighting in the FFN

commit 7ea3695
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Dec 9 12:45:15 2023 +0200

    llama : first working version

commit af1a096
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Dec 9 12:07:39 2023 +0200

    llama : fix cur -> cur_expert

commit aedfad1
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Dec 9 11:47:40 2023 +0200

    llama : update graph to support MoE

commit 861cd67
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Dec 9 11:19:46 2023 +0200

    ggml : sync latest ggml_mul_mat_id

commit a3eefe9
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Dec 9 11:14:03 2023 +0200

    llama : model loading

commit d38e41e
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Dec 9 10:59:37 2023 +0200

    convert : fix n_ff typo

commit dff8cbe
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Dec 9 10:51:58 2023 +0200

    convert : support Mixtral as LLAMA arch

commit 7a69152
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Fri Dec 8 21:06:32 2023 +0800

    lowvram var defaults

commit 7418bca
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Fri Dec 8 19:20:30 2023 +0800

    up ver

commit c47bc28
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Fri Dec 8 18:35:45 2023 +0800

    slight refactor for noscript ui

commit 7469f20
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Fri Dec 8 18:16:14 2023 +0800

    use lowvram flag for offload qkv

commit ec21fa7
Merge: 930cdfb fe680e3
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Fri Dec 8 17:42:26 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	.github/workflows/build.yml
    #	.gitignore
    #	CMakeLists.txt
    #	Makefile
    #	Package.swift
    #	README.md
    #	ggml-cuda.cu
    #	llama.cpp
    #	llama.h
    #	scripts/sync-ggml.sh
    #	tests/CMakeLists.txt

commit 930cdfb
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Fri Dec 8 16:53:30 2023 +0800

    updated lite, added patch that links to noscript mode

commit fe680e3
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Thu Dec 7 22:26:54 2023 +0200

    sync : ggml (new ops, tests, backend, etc.) (ggerganov#4359)

    * sync : ggml (part 1)

    * sync : ggml (part 2, CUDA)

    * sync : ggml (part 3, Metal)

    * ggml : build fixes

    ggml-ci

    * cuda : restore lost changes

    * cuda : restore lost changes (StableLM rope)

    * cmake : enable separable compilation for CUDA

    ggml-ci

    * ggml-cuda : remove device side dequantize

    * Revert "cmake : enable separable compilation for CUDA"

    This reverts commit 09e35d0.

    * cuda : remove assert for rope

    * tests : add test-backend-ops

    * ggml : fix bug in ggml_concat

    * ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()`

    * ci : try to fix macOS

    * ggml-backend : remove backend self-registration

    * ci : disable Metal for macOS cmake build

    ggml-ci

    * metal : fix "supports family" call

    * metal : fix assert

    * metal : print resource path

    ggml-ci

    ---------

    Co-authored-by: slaren <slarengh@gmail.com>

commit bcc0eb4
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Thu Dec 7 13:03:17 2023 +0200

    llama : per-layer KV cache + quantum K cache (ggerganov#4309)

    * per-layer KV

    * remove unnecessary copies

    * less code duplication, offload k and v separately

    * llama : offload KV cache per-layer

    * llama : offload K shift tensors

    * llama : offload for rest of the model arches

    * llama : enable offload debug temporarily

    * llama : keep the KV related layers on the device

    * llama : remove mirrors, perform Device -> Host when partial offload

    * common : add command-line arg to disable KV cache offloading

    * llama : update session save/load

    * llama : support quantum K cache (ggerganov#4312)

    * llama : support quantum K cache (wip)

    * metal : add F32 -> Q8_0 copy kernel

    * cuda : add F32 -> Q8_0 copy kernel

    ggml-ci

    * cuda : use mmv kernel for quantum cache ops

    * llama : pass KV cache type through API

    * llama : fix build

    ggml-ci

    * metal : add F32 -> Q4_0 copy kernel

    * metal : add F32 -> Q4_1 copy kernel

    * cuda : wip

    * cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels

    * llama-bench : support type_k/type_v

    * metal : use mm kernel only for quantum KV cache

    * cuda : add comment

    * llama : remove memory_f16 and kv_f16 flags

    ---------

    Co-authored-by: slaren <slarengh@gmail.com>

    * readme : add API change notice

    ---------

    Co-authored-by: slaren <slarengh@gmail.com>

commit 81bc921
Author: Hongyu Ouyang <96765450+casavaca@users.noreply.github.com>
Date:   Thu Dec 7 02:25:22 2023 -0800

    train : fix ggerganov#4227 (double free in examples/train-text-from-scratch/train-text-from-scratch.cpp) (ggerganov#4351)

    On commit b1108 (44c117f) xaedes added

        ggml_allocr * alloc = NULL;

        ... (many lines in between)

        if (alloc) {
            ggml_allocr_free(alloc);
        }

    Which is correct, but it's easy to lose context after many lines in between.

    On commit b1287 (0e76a899) xaedes made a big change. From here on, alloc is freed eagerly.

        alloc = ggml_allocr_new(...)
        ... (short lines of code)
        ggml_allocr_free(alloc)

    This happens a few times, but alloc is never set to NULL, and many lines below,
    we still have

        if (alloc) {
            ggml_allocr_free(alloc);
        }

    which causes a double-free.

commit 05cd6e5
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Wed Dec 6 20:21:59 2023 +0200

    server : recognize cache_prompt parameter in OAI API (ggerganov#4347)

commit c751152
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Thu Dec 7 00:52:25 2023 +0800

    noscript mode is done

commit 12002d8
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Wed Dec 6 17:51:08 2023 +0800

    very basic noscript mode

commit caa9249
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Wed Dec 6 10:41:03 2023 +0200

    common : fix compile warning

commit da5eaef
Author: stduhpf <stephduh@live.fr>
Date:   Wed Dec 6 09:08:17 2023 +0100

    speculative : support `--color` (ggerganov#4343)

    * speculative: add some colors

    * minor : add braces

    ---------

    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

commit 5f6e0c0
Author: Marcus Dunn <51931484+MarcusDunn@users.noreply.github.com>
Date:   Tue Dec 5 10:55:12 2023 -1000

    grammar : pre-computed pieces + reserve mem + less string copies (ggerganov#4330)

    * reserve space for codepoints

    * improvement for the appended 0

    * used precomputed token text for grammar sample

    * reserve canidates_decoded

    * reserve canidates_grammar

    * remove candidates_decoded

    * Revert "remove candidates_decoded"

    This reverts commit 3773328.

    * changed decode_utf8 to take src by ref

commit 5aa365d
Author: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
Date:   Tue Dec 5 10:19:18 2023 -0700

    llama : allow overriding GGUF metadata when loading model (ggerganov#4092)

    * feat: Allow overriding GGUF metadata when loading model

    * Fix the one time GCC is stricter than clang about something

    * Step1

    * Refactor... basically everything!

    * Nuke obsolete GetArrayLen struct

    * simplify std::string specialization

    * Various cleanups

    Add informational output when overrides are applied

    Warn user when an override with the wrong type is specified

    * Fix broken logic for parsing bool KV overrides
    Fix issue where overrides didn't apply when key missing in GGUF metadata
    Resolve merge changes

    * llama : rearrange model params

    * Update new GET_KEY call

    Add note that metadata KV overrides aren't reflected in initial metadata KV info dump

    ---------

    Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

commit b6f952f
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Tue Dec 5 21:08:10 2023 +0800

    improved exit logic

commit 52c8bc3
Author: MaggotHATE <clay1326@gmail.com>
Date:   Tue Dec 5 15:05:51 2023 +0500

    sampling : custom samplers order (ggerganov#4285)

    * Samplers sequence order w parameter

    * Cleaned commented code

    * Fixed formatting

    * Rewrote with unordered_map

    * Revert and rewrite, too many problems and safeguards would be needed

    * Fixed code style

    * Code style fixes according to review

    * More readable samplers input string, fixed help

    * Style fix in sampler_queue

    * Formatting fixes

    * Fixing whitespaces

commit e4b76bb
Author: kchro3 <62481661+kchro3@users.noreply.github.com>
Date:   Mon Dec 4 23:29:46 2023 -0800

    swift : revert compiler checks for swift package (ggerganov#4332)

commit 23b5e12
Author: Daniel Bevenius <daniel.bevenius@gmail.com>
Date:   Mon Dec 4 17:04:21 2023 +0100

    simple : update error message for KV cache check (ggerganov#4324)

    This commit updates the error message that is printed when the
    KV cache is not big enough to hold all the prompt and generated
    tokens. Specifically it removes the reference to n_parallel and
    replaces it with n_len.

    Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

commit d208995
Author: Miwa / Ensan <63481257+ensan-hcl@users.noreply.github.com>
Date:   Tue Dec 5 01:03:49 2023 +0900

    swift : fix concatenation method to avoid invalid UTF8 stringfication (ggerganov#4325)

commit 5c9f90c
Author: Miwa / Ensan <63481257+ensan-hcl@users.noreply.github.com>
Date:   Mon Dec 4 22:43:45 2023 +0900

    swift : fix prompt tokenization logic (ggerganov#4321)

commit a5a5839
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Mon Dec 4 21:10:42 2023 +0800

    handle accidentally selecting a kcpps file as model instead

commit 4fa44e8
Author: Ikko Eltociear Ashimine <eltociear@gmail.com>
Date:   Mon Dec 4 16:57:35 2023 +0900

    grammar-parser : fix typo (ggerganov#4318)

    preceeding -> preceding

commit 8602f5a
Merge: ac36aee fbbc428
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Sun Dec 3 22:00:14 2023 +0800

    Merge branch 'master' into concedo_experimental

commit fbbc428
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Dec 3 15:56:35 2023 +0200

    ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() (ggerganov#4308)

    * ggml : fix soft max out-of-bounds access

    ggml-ci

    * ggml : reuse ggml_get_n_tasks() in ggml_graph_plan()

    ggml-ci

commit ac36aee
Merge: 48544cd 33e171d
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Sun Dec 3 21:56:29 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt
    #	Makefile

commit adf3de4
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Dec 3 15:56:22 2023 +0200

    ggml : fix soft max out-of-bounds access (ggerganov#4307)

    ggml-ci

commit 48544cd
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Sun Dec 3 21:46:50 2023 +0800

    Revert "Revert "ggml : add ggml_soft_max_ext (ggerganov#4256)""

    This reverts commit a8e66ef.

commit 33e171d
Author: Ed Lee <edilee@mozilla.com>
Date:   Sun Dec 3 01:10:43 2023 -0800

    server : fix OpenAI API `stop` field to be optional (ggerganov#4299)

    (cherry picked from commit Mozilla-Ocho/llamafile@e8c92bc)

commit 6949b50
Author: Rickard Edén <rickardeden@gmail.com>
Date:   Sun Dec 3 10:03:25 2023 +0100

    py : add grammar to oai like api (ggerganov#4294)

commit d7b800b
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Dec 3 10:58:16 2023 +0200

    llama : pad KV cache size (ggerganov#4280)

    * llama : pad KV cache size to 32

    * metal : try to improve batched decoding

commit 6570a20
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Sun Dec 3 15:44:53 2023 +0800

    token count includes ids

commit 5a7d312
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Fri Dec 1 20:39:12 2023 +0200

    llama : avoid using "optional" keyword (ggerganov#4283)

commit d5a1cbd
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Fri Dec 1 20:35:03 2023 +0200

    llama : support optional tensors (ggerganov#4283)

commit b220222
Author: Miwa / Ensan <63481257+ensan-hcl@users.noreply.github.com>
Date:   Sat Dec 2 03:19:45 2023 +0900

    swift : fix token_to_piece implementation (ggerganov#4278)

    * Fix token_to_piece implementation in Swift

    * Fix errors

commit 511f52c
Author: Jared Van Bortel <jared@nomic.ai>
Date:   Fri Dec 1 13:18:35 2023 -0500

    build : enable libstdc++ assertions for debug builds (ggerganov#4275)

commit 03562f3
Author: CausalLM <148736309+CausalLM@users.noreply.github.com>
Date:   Sat Dec 2 02:17:06 2023 +0800

    llama : support attention bias on LLaMA architecture (ggerganov#4283)

    * Support attention_bias on LLaMA architecture

    QKVO bias, should fix InternLM (ggerganov#3133) and works for LLaMAfied Qwen models (ggerganov#3743 (comment)).

    * check existence of qkvo bias while loading llama models

    Tested on LLaMA2, CUDA and CPU.

    * Update llama.cpp

commit 37c746d
Author: Shijie <821898965@qq.com>
Date:   Sat Dec 2 02:16:31 2023 +0800

    llama : add Qwen support (ggerganov#4281)

    * enable qwen to llama.cpp

    * llama : do not GPU split bias tensors

    ---------

    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

commit 880f579
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Fri Dec 1 18:42:11 2023 +0200

    llama : fix integer overflow during quantization (ggerganov#4284)

    happens with multi-threaded quantization of Qwen-72B

    ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants