Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Token pattern not found in the list" error #24

Open
nshen7 opened this issue Jul 21, 2024 · 4 comments
Open

"Token pattern not found in the list" error #24

nshen7 opened this issue Jul 21, 2024 · 4 comments

Comments

@nshen7
Copy link

nshen7 commented Jul 21, 2024

Hi there,

I got this "Token pattern not found in the list" error when I tried out the model under no_grad() condition. Would you take a look at this please? Many thanks!! See below for the code and error message:

input_ids_a, input_ids_b, labels = next(TRAIN_DATASET)

with torch.no_grad():
    outputs_a = model(input_ids=input_ids_a)
    outputs_b = model(input_ids=input_ids_b)
ValueError Traceback (most recent call last)
File :3

File /usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File /usr/local/lib/python3.10/site-packages/peft/peft_model.py:1238, in PeftModelForSequenceClassification.forward(self, input_ids, attention_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict, task_ids, **kwargs)
1236 if peft_config.peft_type == PeftType.POLY:
1237 kwargs["task_ids"] = task_ids
-> 1238 return self.base_model(
1239 input_ids=input_ids,
1240 attention_mask=attention_mask,
1241 inputs_embeds=inputs_embeds,
1242 labels=labels,
1243 output_attentions=output_attentions,
1244 output_hidden_states=output_hidden_states,
1245 return_dict=return_dict,
1246 **kwargs,
1247 )
1249 batch_size = _get_batch_size(input_ids, inputs_embeds)
1250 if attention_mask is not None:
1251 # concat prompt attention mask

File /usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File /usr/local/lib/python3.10/site-packages/peft/tuners/tuners_utils.py:179, in BaseTuner.forward(self, *args, **kwargs)
178 def forward(self, *args: Any, **kwargs: Any):
--> 179 return self.model.forward(*args, **kwargs)

File ~/.cache/huggingface/modules/transformers_modules/RLHFlow/ArmoRM-Llama3-8B-v0.1/97bc38d5bc709b850e236ef5f03589f6098552c0/modeling_custom.py:152, in LlamaForRewardModelWithGating.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
149 assert hidden_states.shape == (batch_size, self.config.hidden_size)
150 rewards = self.regression_layer(hidden_states)
--> 152 gating_token_positions = [find_token_for_gating(ids.tolist()) for ids in input_ids]
153 prompt_embedding = tokens_hidden_states[dummy_iterator, gating_token_positions, :]
154 gating_output = self.gating(prompt_embedding)

File ~/.cache/huggingface/modules/transformers_modules/RLHFlow/ArmoRM-Llama3-8B-v0.1/97bc38d5bc709b850e236ef5f03589f6098552c0/modeling_custom.py:152, in (.0)
149 assert hidden_states.shape == (batch_size, self.config.hidden_size)
150 rewards = self.regression_layer(hidden_states)
--> 152 gating_token_positions = [find_token_for_gating(ids.tolist()) for ids in input_ids]
153 prompt_embedding = tokens_hidden_states[dummy_iterator, gating_token_positions, :]
154 gating_output = self.gating(prompt_embedding)

File ~/.cache/huggingface/modules/transformers_modules/RLHFlow/ArmoRM-Llama3-8B-v0.1/97bc38d5bc709b850e236ef5f03589f6098552c0/modeling_custom.py:47, in find_token_for_gating(lst)
45 if lst[j:j + token_pattern_len] == token_pattern:
46 return j
---> 47 raise ValueError("Token pattern not found in the list.")

ValueError: Token pattern not found in the list.
@Haoxiang-Wang
Copy link
Collaborator

Have you updated transformers to the newest version? @nshen7

@nshen7
Copy link
Author

nshen7 commented Jul 22, 2024

I was using Version 4.41.2, but the error persists with the newest version.

To add to the context, I was using the default tokenizer from RLHFlow/ArmoRM-Llama3-8B-v0.1

tokenizer = AutoTokenizer.from_pretrained(CFG.MODEL_NAME, trust_remote_code=CFG.TRUEST_REMOTE)
INPUT_IDS_A = tokenizer.apply_chat_template(
    train['message_a'].tolist(), 
    tokenize=True,
    padding=True, 
    truncation=True, 
    max_length=CFG.MAX_LENGTH, 
    return_tensors='np'
)
INPUT_IDS_B = tokenizer.apply_chat_template(
    train['message_b'].tolist(), 
    tokenize=True,
    padding=True, 
    truncation=True, 
    max_length=CFG.MAX_LENGTH, 
    return_tensors='np'
)

And I was trying to fine-tune the gating layer of the model with LoRA:

lora_config = LoraConfig(
    r=CFG.LORA_RANK,  # the dimension of the low-rank matrices
    lora_alpha = CFG.LORA_ALPHA, # scaling factor for LoRA activations vs pre-trained weight activations
    lora_dropout= CFG.DROPOUT, 
    bias='none',
    inference_mode=False,
    task_type=TaskType.SEQ_CLS,
    target_modules=CFG.LORA_MODULES ) # Only Use Output and Values Projection
model = get_peft_model(base_model, lora_config)

Configs:

class CFG:
    NUM_EPOCHS = 1
    BATCH_SIZE = 16
    DROPOUT = 0.05 
    MODEL_NAME = "RLHFlow/ArmoRM-Llama3-8B-v0.1"
    TRUEST_REMOTE = True
    SEED = 2024 
    MAX_LENGTH = 1024 
    NUM_WARMUP_STEPS = 128
    LR_MAX = 5e-5 
    NUM_LABELS = 3 
    GATING_TEMP = 10
    LORA_RANK = 4
    LORA_ALPHA = 8
    LORA_MODULES = ["gating.layers.0", "gating.layers.1", "gating.layers.2", "gating.layers.3"]

@glorgao
Copy link

glorgao commented Jul 31, 2024

Same error.

I believe the reason is about the function defined in the modeling_custom.py, find_token_for_gating(), which caputures the token partten:
<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n

The deeper reason lies on the dataset side. I find the code always falls on a sample with the "response" are null. For example the sample with prompt_id c0dfe114bb80a25990c193539a0e8e43557ba7236fd00e71731b852b4e7849a9 in the UltraFeedback_binarized dataset:
https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized/viewer/default/train_prefs?q=Maybe+using+FILTER+or+VLOOKUP&row=24707

I believe the solution lies on removing these bad samples.

@zjuruizhechen
Copy link

Same error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants