llama.cpp: gemma: allow offloading the output tensor #1997

cebtenzzre · 2024-02-21T22:13:38Z

edit: currently segfaulting

ref ggerganov/llama.cpp#5646 Signed-off-by: Jared Van Bortel <jared@nomic.ai>

manyoso · 2024-02-22T15:32:33Z

Does #5651 fix?

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre · 2024-02-22T15:44:26Z

Does #5651 fix?

Yes, it doesn't crash anymore. Fix is now included in this PR.

manyoso · 2024-02-22T16:21:51Z

If this has been reviewed upstream I don't want to presume to review it again here. Go ahead.

llama.cpp: gemma: allow offloading the output tensor

0eedc14

ref ggerganov/llama.cpp#5646 Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre requested review from manyoso and removed request for manyoso February 21, 2024 22:13

llama.cpp: add fix for crash caused by previous change

390e82b

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre requested a review from manyoso February 22, 2024 15:43

cebtenzzre merged commit fc6c5ea into main Feb 22, 2024
6 of 17 checks passed

cebtenzzre deleted the gemma-output-offload branch February 22, 2024 19:06

Provide feedback