Add LoRA support for Gemma #3050

WoosukKwon · 2024-02-27T07:00:14Z

Yard1 · 2024-02-27T07:15:58Z

Looks good, can we extend the test?

WoosukKwon · 2024-02-27T20:06:59Z

@Yard1 Added. PTAL!

csrc/punica/bgmv/bgmv_config.h

WoosukKwon · 2024-02-28T03:06:58Z

@Yard1 The PR continues to fail in test_llama_lora_warmup while this PR seems to be irrelevant to it. Do you have any idea about this? I tried to debug but didn't succeed.

Yard1 · 2024-02-28T19:26:44Z

Hmm running the entire thing as a new process should help with cleanup. Or maybe try --forked?

WoosukKwon · 2024-02-28T20:48:36Z

@Yard1 Adding --forked indeed resolved the issue but increased the lora test time from 13 mins to 37 mins.

WoosukKwon · 2024-02-28T21:03:03Z

Let's merge the PR and optimize the testing time later. I thin this is ok since the lora test is not required for most PRs and the kernel test (which also uses --forked) takes a similar amount of time.

zhaotyer · 2024-02-29T03:06:59Z

Gemma "vocab_size" is 256000, Will the following restrictions have any impact? Should they be removed?
vllm/lora/layers.py

# Keep this in sync with csrc/punica/bgmv/bgmv_config.h
if 32000 < self.base_layer.vocab_size > 33024:
       raise ValueError(
                "When using LoRA, vocab size must be 32000 >= vocab_size <= 33024"
            )

Yard1 · 2024-02-29T05:31:07Z

We are not supporting lm_head deltas for gemma

zhaotyer · 2024-03-04T07:04:46Z

We are not supporting lm_head deltas for gemma

Thank you for your reply @Yard1

            new_module = replace_submodule(
                            self.model, module_name,
                            from_layer(module, self.lora_slots, self.lora_config,
                                       self.model.config))
            # (yard1): TODO make this more robust
            if "lm_head" in module_name:
                sampler_module = self.model.get_submodule("sampler")
                new_module = replace_submodule(
                    self.model, "sampler",
                    from_layer_sampler(sampler_module, module, self.lora_slots,
                                       self.lora_config, self.model.config))

Vocal_size verification will only be performed when lm_head is also processed by lora, but please take a look at this question. #3000

Add LoRA support for Gemma

3ea19ce

WoosukKwon requested a review from Yard1 February 27, 2024 07:00

WoosukKwon added 2 commits February 27, 2024 20:05

Add test for Gemma + LoRA

fa04837

Add punica kernel support for Gemma 7B

c55658f

Yard1 approved these changes Feb 27, 2024

View reviewed changes

Yard1 reviewed Feb 27, 2024

View reviewed changes

csrc/punica/bgmv/bgmv_config.h Show resolved Hide resolved

WoosukKwon added 5 commits February 27, 2024 20:26

Add more sizes to punica test

08573e2

Fix for memory usage

f8adea9

Explicitly call GC for memory reuse

de2b481

Empty cache

1237418

Use cleanup

7afa08d

WoosukKwon added 3 commits February 28, 2024 04:25

Reduce max_model_len for test_llama_lora_warmup

d0f8bb3

Ensure GC

e38361a

GC for gemma

2358196

WoosukKwon added 2 commits February 28, 2024 20:03

Remove cleanup

ce07367

Add --forked

071ac03

WoosukKwon enabled auto-merge (squash) February 28, 2024 21:03

WoosukKwon disabled auto-merge February 28, 2024 21:03

WoosukKwon merged commit 929b4f2 into main Feb 28, 2024
20 of 22 checks passed

WoosukKwon deleted the gemma-lora branch February 28, 2024 21:03

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Add LoRA support for Gemma (vllm-project#3050)

4bb5344

SparkJiao mentioned this pull request Mar 16, 2024

KeyError: lm_head.weight in GemmaForCausalLM.load_weights when loading finetuned Gemma 2B #3323

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LoRA support for Gemma #3050

Add LoRA support for Gemma #3050

WoosukKwon commented Feb 27, 2024

Yard1 commented Feb 27, 2024 •

edited

Loading

WoosukKwon commented Feb 27, 2024

WoosukKwon commented Feb 28, 2024

Yard1 commented Feb 28, 2024 •

edited

Loading

WoosukKwon commented Feb 28, 2024

WoosukKwon commented Feb 28, 2024

zhaotyer commented Feb 29, 2024

Yard1 commented Feb 29, 2024

zhaotyer commented Mar 4, 2024

Add LoRA support for Gemma #3050

Add LoRA support for Gemma #3050

Conversation

WoosukKwon commented Feb 27, 2024

Yard1 commented Feb 27, 2024 • edited Loading

WoosukKwon commented Feb 27, 2024

WoosukKwon commented Feb 28, 2024

Yard1 commented Feb 28, 2024 • edited Loading

WoosukKwon commented Feb 28, 2024

WoosukKwon commented Feb 28, 2024

zhaotyer commented Feb 29, 2024

Yard1 commented Feb 29, 2024

zhaotyer commented Mar 4, 2024

Yard1 commented Feb 27, 2024 •

edited

Loading

Yard1 commented Feb 28, 2024 •

edited

Loading