Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading model + merged adapter is different to model + adapter? #128

Closed
DorotheaMueller opened this issue Jul 23, 2024 · 6 comments · Fixed by #134
Closed

Loading model + merged adapter is different to model + adapter? #128

DorotheaMueller opened this issue Jul 23, 2024 · 6 comments · Fixed by #134

Comments

@DorotheaMueller
Copy link

Loading a model with merged PEFT adapter gives me different results than using a model and merge the PEFT adapter.
Here is the minimal example:

Test Case 1: Simply loading adapter

import gc
import torch
from llm2vec import LLM2Vec
test_str = ["Test1", "Mr. and Mrs. Dursley of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much.", "Flying in a dream, stars by the pocketful."] 

l2v = LLM2Vec.from_pretrained(
    "McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp",
    peft_model_name_or_path="McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised",
    device_map="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.bfloat16,
    merge_peft=False
)
arr_not_merged = l2v.encode(test_str)

del l2v
gc.collect()
torch.cuda.empty_cache()

Test Case 2: Merging Adapters
Test Case 3: Loading Model with Merged Adapters

# Test case 2: merging adapter
l2v_merged = LLM2Vec.from_pretrained(
    "McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp",
    peft_model_name_or_path="McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised",
    device_map="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.bfloat16,
    merge_peft=True
)
arr_merged = l2v_merged.encode(test_str)
dest_path = "mypathhere"
l2v_merged.model._hf_peft_config_loaded = False # not all peft attributes are properly removed after merge and unload => needed, otherwise if  _hf_peft_config_loaded = True => local variable 'active_adapters' referenced before assignment (similar fix as in code)
l2v_merged.save(dest_path) 

# Test case 3: loading merged adapter
loaded_model = LLM2Vec.from_pretrained(dest_path, enable_bidirectional=True) #, enable_bidirectional=True) does not make difference
arr_loaded_model = loaded_model.encode(test_str)

Testing distance of embeddings as sanity check
Testing now how different the embeddings are:

for v1, v2 in zip(arr_not_merged, arr_merged):
  print(torch.allclose(v1, v2, atol=1e-1))
  dist = torch.sqrt(torch.sum(torch.pow(torch.subtract(v1, v2), 2), dim=0)) 
  print(dist) # Yields distances around ~1

for v1, v2 in zip(arr_loaded_model, arr_merged):
  print(torch.allclose(v1, v2, atol=1e-1))
  dist = torch.sqrt(torch.sum(torch.pow(torch.subtract(v1, v2), 2), dim=0)) 
  print(dist) # Yields distances ~50 - 100

I would have expected that distance between the model + merged adapter and loaded model that has the adapter merged would be equally small to the first check. What am I missing?

@DorotheaMueller
Copy link
Author

DorotheaMueller commented Jul 24, 2024

I found the issue in the code. When loading a merged peft model from a local path, the start and end tokens will not be appended properly (see llm2vec.py, line 138ff):

def prepare_for_tokenization(self, text): 
  if self.model.config_name_or_path == "meta-llama/Meta-Llama-3-8B-Instruct": 
  (...) 

Since the self.model.config_name_or_path points to the local path, start and end token will not be appended. This affects all merged and loaded models, not only my llama example.

For a temporary fix, it is sufficient to load the model and to manually reset this.

loaded_model = LLM2Vec.from_pretrained(dest_path, enable_bidirectional=True)
loaded_model.model.config_name_or_path = "meta-llama/Meta-Llama-3-8B-Instruct"

This behaviour is surprising, could you please document/fix it?

@vaibhavad
Copy link
Collaborator

Hello @DorotheaMueller,

Thanks a lot for bring this issue to our attention.

Regarding comparison between Test case 1 and Test case 2 - this is a known issue that arises from quantization + PEFT interaction. It is beyond the scope of our library. All our evaluations were done with Test case 1, hence, if the goal is to reproduce the results, please use Test case 1. Also, the issue goes away on changing BF16 to Float 32. The resulting dist from the code below with Float 32 is 0.0002

for v1, v2 in zip(arr_not_merged, arr_merged):
print(torch.allclose(v1, v2, atol=1e-1))
dist = torch.sqrt(torch.sum(torch.pow(torch.subtract(v1, v2), 2), dim=0))
print(dist) # Yields distances around ~1

vaibhavad added a commit that referenced this issue Jul 30, 2024
@vaibhavad vaibhavad linked a pull request Jul 30, 2024 that will close this issue
@vaibhavad vaibhavad reopened this Jul 30, 2024
@vaibhavad
Copy link
Collaborator

@DorotheaMueller - Regarding test case 2 and 3, this is a nice catch, thanks for bringing this to our attention. #134 fixes this.

@vaibhavad
Copy link
Collaborator

You can use the latest version if you are building from source or install llm2vec==0.2.2 from pip to get the latest changes.

@DorotheaMueller
Copy link
Author

Great, thanks a lot for addressing and fixing the issue! 👍
Exactly, test case 1 was only there to sanity-check and compare the distances to test case 2 and 3.

@vaibhavad
Copy link
Collaborator

Closing as this is resolved now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants