Loading model + merged adapter is different to model + adapter? #128

DorotheaMueller · 2024-07-23T15:07:51Z

Loading a model with merged PEFT adapter gives me different results than using a model and merge the PEFT adapter.
Here is the minimal example:

Test Case 1: Simply loading adapter

import gc
import torch
from llm2vec import LLM2Vec
test_str = ["Test1", "Mr. and Mrs. Dursley of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much.", "Flying in a dream, stars by the pocketful."] 

l2v = LLM2Vec.from_pretrained(
    "McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp",
    peft_model_name_or_path="McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised",
    device_map="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.bfloat16,
    merge_peft=False
)
arr_not_merged = l2v.encode(test_str)

del l2v
gc.collect()
torch.cuda.empty_cache()

Test Case 2: Merging Adapters
Test Case 3: Loading Model with Merged Adapters

# Test case 2: merging adapter
l2v_merged = LLM2Vec.from_pretrained(
    "McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp",
    peft_model_name_or_path="McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised",
    device_map="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.bfloat16,
    merge_peft=True
)
arr_merged = l2v_merged.encode(test_str)
dest_path = "mypathhere"
l2v_merged.model._hf_peft_config_loaded = False # not all peft attributes are properly removed after merge and unload => needed, otherwise if  _hf_peft_config_loaded = True => local variable 'active_adapters' referenced before assignment (similar fix as in code)
l2v_merged.save(dest_path) 

# Test case 3: loading merged adapter
loaded_model = LLM2Vec.from_pretrained(dest_path, enable_bidirectional=True) #, enable_bidirectional=True) does not make difference
arr_loaded_model = loaded_model.encode(test_str)

Testing distance of embeddings as sanity check
Testing now how different the embeddings are:

for v1, v2 in zip(arr_not_merged, arr_merged):
  print(torch.allclose(v1, v2, atol=1e-1))
  dist = torch.sqrt(torch.sum(torch.pow(torch.subtract(v1, v2), 2), dim=0)) 
  print(dist) # Yields distances around ~1

for v1, v2 in zip(arr_loaded_model, arr_merged):
  print(torch.allclose(v1, v2, atol=1e-1))
  dist = torch.sqrt(torch.sum(torch.pow(torch.subtract(v1, v2), 2), dim=0)) 
  print(dist) # Yields distances ~50 - 100

I would have expected that distance between the model + merged adapter and loaded model that has the adapter merged would be equally small to the first check. What am I missing?

The text was updated successfully, but these errors were encountered:

DorotheaMueller · 2024-07-24T08:18:44Z

I found the issue in the code. When loading a merged peft model from a local path, the start and end tokens will not be appended properly (see llm2vec.py, line 138ff):

def prepare_for_tokenization(self, text): 
  if self.model.config_name_or_path == "meta-llama/Meta-Llama-3-8B-Instruct": 
  (...)

Since the self.model.config_name_or_path points to the local path, start and end token will not be appended. This affects all merged and loaded models, not only my llama example.

For a temporary fix, it is sufficient to load the model and to manually reset this.

loaded_model = LLM2Vec.from_pretrained(dest_path, enable_bidirectional=True)
loaded_model.model.config_name_or_path = "meta-llama/Meta-Llama-3-8B-Instruct"

This behaviour is surprising, could you please document/fix it?

vaibhavad · 2024-07-30T11:07:06Z

Hello @DorotheaMueller,

Thanks a lot for bring this issue to our attention.

Regarding comparison between Test case 1 and Test case 2 - this is a known issue that arises from quantization + PEFT interaction. It is beyond the scope of our library. All our evaluations were done with Test case 1, hence, if the goal is to reproduce the results, please use Test case 1. Also, the issue goes away on changing BF16 to Float 32. The resulting dist from the code below with Float 32 is 0.0002

for v1, v2 in zip(arr_not_merged, arr_merged):
print(torch.allclose(v1, v2, atol=1e-1))
dist = torch.sqrt(torch.sum(torch.pow(torch.subtract(v1, v2), 2), dim=0))
print(dist) # Yields distances around ~1

fixes #128

vaibhavad · 2024-07-30T12:13:22Z

@DorotheaMueller - Regarding test case 2 and 3, this is a nice catch, thanks for bringing this to our attention. #134 fixes this.

vaibhavad · 2024-07-30T12:28:17Z

You can use the latest version if you are building from source or install llm2vec==0.2.2 from pip to get the latest changes.

DorotheaMueller · 2024-07-31T14:12:28Z

Great, thanks a lot for addressing and fixing the issue! 👍
Exactly, test case 1 was only there to sanity-check and compare the distances to test case 2 and 3.

vaibhavad · 2024-08-29T15:54:59Z

Closing as this is resolved now.

vaibhavad added a commit that referenced this issue Jul 30, 2024

fixes #128

83a0bb7

vaibhavad linked a pull request Jul 30, 2024 that will close this issue

fixes #128 #134

Merged

vaibhavad closed this as completed in #134 Jul 30, 2024

vaibhavad added a commit that referenced this issue Jul 30, 2024

Merge pull request #134 from McGill-NLP/load-model-tokenization

5365b18

fixes #128

vaibhavad reopened this Jul 30, 2024

vaibhavad closed this as completed Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading model + merged adapter is different to model + adapter? #128

Loading model + merged adapter is different to model + adapter? #128

DorotheaMueller commented Jul 23, 2024

DorotheaMueller commented Jul 24, 2024 •

edited

Loading

vaibhavad commented Jul 30, 2024

vaibhavad commented Jul 30, 2024

vaibhavad commented Jul 30, 2024

DorotheaMueller commented Jul 31, 2024

vaibhavad commented Aug 29, 2024

Loading model + merged adapter is different to model + adapter? #128

Loading model + merged adapter is different to model + adapter? #128

Comments

DorotheaMueller commented Jul 23, 2024

DorotheaMueller commented Jul 24, 2024 • edited Loading

vaibhavad commented Jul 30, 2024

vaibhavad commented Jul 30, 2024

vaibhavad commented Jul 30, 2024

DorotheaMueller commented Jul 31, 2024

vaibhavad commented Aug 29, 2024

DorotheaMueller commented Jul 24, 2024 •

edited

Loading