The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct #33399

toondata · 2024-09-10T06:23:56Z

System Info

transformers version: 4.45.0.dev0
Platform: macOS-14.6.1-arm64-arm-64bit
Python version: 3.12.4
Huggingface_hub version: 0.24.6
Safetensors version: 0.4.5
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.4.1 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:

Who can help?

@zucchini-nlp @amyer

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Run this code after git clone with the hash I specified above and pip install ./transformers

from transformers import Qwen2VLForConditionalGeneration,AutoModel,AutoProcessor

model_path=".models/Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(
             model_path,
             torch_dtype=torch.bfloat16,
             #attn_implementation="default"
        ).to(self.device) #device="mps"
min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_path,
                                                   min_pixels=min_pixels, 
                                                   max_pixels=max_pixels
                                                   )
messages = [
            {
                "role": "user",
                "content": [
                    {
                        "type": "image"
                    },
                    {
                        "type": "text",
                        "text": "Extract text from pdf"
                    }
                ]
            }
        ]
base64_data = image_data.split(',')[1]  # remove 'data:image/jpeg;base64,' 
image_bytes = base64.b64decode(base64_data)
image = Image.open(io.BytesIO(image_bytes))
text = processor.apply_chat_template(
      messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(
      text=[text],
      images=[image],
).to(self.device)#device="mps"

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
       generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
return output_text  # Dummy return

Expected behavior

File "/Users/dev/products/dev/workspaces/mixparse/llm/model/modelmanager.py", line 429, in _run_safetensors_inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/transformers/generation/utils.py", line 2015, in generate
result = self._sample(
^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/transformers/generation/utils.py", line 2965, in _sample
outputs = self(**model_inputs, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
inputs_embeds[image_mask] = image_embeds
RuntimeError: shape mismatch: value tensor of shape [630, 3584] cannot be broadcast to indexing result of shape [0, 3584]

The text was updated successfully, but these errors were encountered:

amyer · 2024-09-10T06:24:29Z

这是来自QQ邮箱的假期自动回复邮件。您好，我最近正在休假中，无法亲自回复您的邮件。我将在假期结束后，尽快给您回复。

LysandreJik · 2024-09-10T07:53:46Z

Thanks for the issue @toondata! You mention it's the same issue as #31377; does applying the same fix here solve the issue?

toondata · 2024-09-10T08:32:09Z

Sorry, my expression may have caused you misunderstanding. I encountered a problem similar to issue #31377. However, given the differences in implementation logic between idefics2 and Qwen/Qwen2-VL-7B-Instruct, I'm unsure whether the causes of these similar phenomena are the same. Despite downloading and compiling the latest mainline code, the issue remains unresolved.

Thanks for the issue @toondata! You mention it's the same issue as #31377; does applying the same fix here solve the issue?

zucchini-nlp · 2024-09-10T10:02:10Z

@toondata can you share the hash pls, I can't find it. But I tried to run your code in the latest main and got no errors, can you double check that updating version helps. Also, I see you have 'mps' commented out. We had several problems with size mismatch in LLaVA so the error might be related to that. If you can get try to run it in 'cuda', maybe in colab notebook, would be helpful to see the source of error

import requests
from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration,AutoModel,AutoProcessor


model_path="Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(model_path, torch_dtype=torch.bfloat16,).to("cuda:0")

min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_path,
                            min_pixels=min_pixels, 
                            max_pixels=max_pixels
                        )
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image"
            },
            {
                "type": "text",
                "text": "Extract text from pdf"
            }
        ]
    }
]

image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
text = processor.apply_chat_template(
      messages, tokenize=False, add_generation_prompt=True
)
inputs = processor( text=[text], images=[image],).to("cuda:0")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
       generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

toondata · 2024-09-10T14:38:22Z

Here is my git hash: 96429e7.
I just updated to the latest version on the main branch and reinstalled transformers via pip install, but the result is still the same as before.
I don’t have a CUDA device at hand. My device is a Mac M3 Max, and I commented out “mps” in order to provide you with more accurate information. Could this issue be related to MPS?

toondata · 2024-09-10T14:56:46Z

I ran your code and image source with the device changed to MPS, and the issue remains the same, except the tensor that caused the issue has different dimensions.

File "/Users/dev/products/dev/workspaces/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
inputs_embeds[image_mask] = image_embeds
RuntimeError: shape mismatch: value tensor of shape [1247, 3584] cannot be broadcast to indexing result of shape [0, 3584]

@toondata can you share the hash pls, I can't find it. But I tried to run your code in the latest main and got no errors, can you double check that updating version helps. Also, I see you have 'mps' commented out. We had several problems with size mismatch in LLaVA so the error might be related to that. If you can get try to run it in 'cuda', maybe in colab notebook, would be helpful to see the source of error

import requests
from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration,AutoModel,AutoProcessor


model_path="Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(model_path, torch_dtype=torch.bfloat16,).to("cuda:0")

min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_path,
                            min_pixels=min_pixels, 
                            max_pixels=max_pixels
                        )
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image"
            },
            {
                "type": "text",
                "text": "Extract text from pdf"
            }
        ]
    }
]

image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
text = processor.apply_chat_template(
      messages, tokenize=False, add_generation_prompt=True
)
inputs = processor( text=[text], images=[image],).to("cuda:0")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
       generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

zucchini-nlp · 2024-09-10T15:54:01Z

maybe #30294 helps, it has a solution that worked for llava with mps

toondata · 2024-09-11T03:22:33Z

After looking at #30294, I feel the issue might not be related. I switched my local code to run on the CPU, and the problem is the same as with MPS.

image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
            text = processor.apply_chat_template(
                messages, tokenize=False, add_generation_prompt=True
            )
            inputs = processor( text=[text], images=[image],).to("cpu")

            # Inference: Generation of the output
            generated_ids = model.generate(**inputs, max_new_tokens=128)

File "/Users/dev/products/dev/workspaces/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
inputs_embeds[image_mask] = image_embeds
RuntimeError: shape mismatch: value tensor of shape [1247, 3584] cannot be broadcast to indexing result of shape [0, 3584]

zucchini-nlp · 2024-09-11T08:29:03Z

I could also get a colab notebook working with the script, and the error on cpu also might happen as per the linked issue.

Let me see if I can get an mps to reproduce it, will need some time to dig

toondata · 2024-09-11T13:04:59Z

Thank you very much, looking forward to the results of your digings.

smallsmallwood · 2024-09-20T02:58:33Z

met the same question

github-actions · 2024-10-14T08:03:35Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

amyer · 2024-10-14T08:04:09Z

这是来自QQ邮箱的假期自动回复邮件。您好，我最近正在休假中，无法亲自回复您的邮件。我将在假期结束后，尽快给您回复。

mano3-1 · 2024-11-02T10:00:13Z

hey @toondata ,
I am facing similar issue while finetuning the model. Were you able to fix it?

amyer · 2024-11-02T10:00:47Z

这是来自QQ邮箱的假期自动回复邮件。您好，我最近正在休假中，无法亲自回复您的邮件。我将在假期结束后，尽快给您回复。

anshkumar · 2024-11-22T10:59:33Z

Getting same error:

Parameter Offload: Total persistent parameters: 686592 in 401 params
{'loss': 4.9398, 'grad_norm': 243.39051064272118, 'learning_rate': 1.4005602240896359e-08, 'epoch': 0.0}                                                  
  0%|                                                                                                                | 1/23800 [00:03<25:10:16,  3.81s/it][rank1]: Traceback (most recent call last):
[rank1]:   File "/home/sort/ved/sort/apple_l2/Qwen2-VL-Finetune/src/training/train.py", line 209, in <module>
[rank1]:     train()
[rank1]:   File "/home/sort/ved/sort/apple_l2/Qwen2-VL-Finetune/src/training/train.py", line 184, in train
[rank1]:     trainer.train()
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2122, in train
[rank1]:     return inner_training_loop(
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2474, in _inner_training_loop
[rank1]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3572, in training_step
[rank1]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3625, in compute_loss
[rank1]:     outputs = model(**inputs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank1]:     ret_val = func(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
[rank1]:     loss = self.module(*inputs, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank1]:     return inner()
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank1]:     result = forward_call(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/liger_kernel/transformers/model/qwen2_vl.py", line 109, in lce_forward
[rank1]:     inputs_embeds[image_mask] = image_embeds
[rank1]: RuntimeError: shape mismatch: value tensor of shape [49, 1536] cannot be broadcast to indexing result of shape [0, 1536]
W1122 16:10:57.817000 79821 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 79875 closing signal SIGTERM
E1122 16:10:58.032000 79821 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 79876) of binary: /home/sort/miniconda3/envs/qwen2/bin/python
Traceback (most recent call last):
  File "/home/sort/miniconda3/envs/qwen2/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1165, in launch_command
    multi_gpu_launcher(args)
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/accelerate/commands/launch.py", line 799, in multi_gpu_launcher
    distrib_run.run(args)
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
    elastic_launch(
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
src/training/train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-11-22_16:10:57
  host      : sort-X570S-AORUS-MASTER
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 79876)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

amyer · 2024-11-22T11:00:05Z

这是来自QQ邮箱的假期自动回复邮件。您好，我最近正在休假中，无法亲自回复您的邮件。我将在假期结束后，尽快给您回复。

anshkumar · 2024-11-23T08:51:22Z

Getting same error:

Parameter Offload: Total persistent parameters: 686592 in 401 params
{'loss': 4.9398, 'grad_norm': 243.39051064272118, 'learning_rate': 1.4005602240896359e-08, 'epoch': 0.0}                                                  
  0%|                                                                                                                | 1/23800 [00:03<25:10:16,  3.81s/it][rank1]: Traceback (most recent call last):
[rank1]:   File "/home/sort/ved/sort/apple_l2/Qwen2-VL-Finetune/src/training/train.py", line 209, in <module>
[rank1]:     train()
[rank1]:   File "/home/sort/ved/sort/apple_l2/Qwen2-VL-Finetune/src/training/train.py", line 184, in train
[rank1]:     trainer.train()
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2122, in train
[rank1]:     return inner_training_loop(
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2474, in _inner_training_loop
[rank1]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3572, in training_step
[rank1]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3625, in compute_loss
[rank1]:     outputs = model(**inputs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank1]:     ret_val = func(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
[rank1]:     loss = self.module(*inputs, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank1]:     return inner()
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank1]:     result = forward_call(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/liger_kernel/transformers/model/qwen2_vl.py", line 109, in lce_forward
[rank1]:     inputs_embeds[image_mask] = image_embeds
[rank1]: RuntimeError: shape mismatch: value tensor of shape [49, 1536] cannot be broadcast to indexing result of shape [0, 1536]
W1122 16:10:57.817000 79821 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 79875 closing signal SIGTERM
E1122 16:10:58.032000 79821 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 79876) of binary: /home/sort/miniconda3/envs/qwen2/bin/python
Traceback (most recent call last):
  File "/home/sort/miniconda3/envs/qwen2/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1165, in launch_command
    multi_gpu_launcher(args)
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/accelerate/commands/launch.py", line 799, in multi_gpu_launcher
    distrib_run.run(args)
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
    elastic_launch(
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
src/training/train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-11-22_16:10:57
  host      : sort-X570S-AORUS-MASTER
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 79876)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

My dataset was missing <image> token. Adding it fixed the issue.

knoel99 · 2024-12-02T14:38:57Z

Commenting to get updates.

I have a similar error using colqwen.

from colpali_engine.models import ColQwen2, ColQwen2Processor
from colpali_engine.utils.torch_utils import get_torch_device
from transformers.models.qwen2_vl import Qwen2VLForConditionalGeneration, Qwen2VLProcessor

2024-12-02 14:21:56,934 - ERROR - Error processing documents_img/image.png:
Image features and image tokens do not match: tokens: 0, features 2160
Traceback (most recent call last):
  File "/content/vision-rag/vision-rag/colqwen.py", line 146, in process_single_document
    outputs = self.model(**inputs, output_hidden_states=True)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/vision-rag/vision-rag/colqwen.py", line 47, in forward
    return Qwen2VLForConditionalGeneration.forward(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1690, in forward
    raise ValueError(
ValueError: Image features and image tokens do not match: tokens: 0, features 2160

/content/vision-rag# pip show transformers colpali-engine
Name: transformersg# clear
Version: 4.46.2-rag# pip show transformers colpali-eng
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: colpali_engine, peft, sentence-transformers
---
Name: colpali_engine
Version: 0.3.4
Summary: The code used to train and run inference with the ColPali architecture.
Home-page: https://github.com/illuin-tech/colpali
Author: 
Author-email: Manuel Faysse <manuel.faysse@illuin.tech>, Hugues Sibille <hugues.sibille@illuin.tech>, Tony Wu <tony.wu@illuin.tech>
License: 
Location: /usr/local/lib/python3.10/dist-packages
Requires: gputil, numpy, peft, pillow, requests, torch, transformers
Required-by:

zucchini-nlp · 2024-12-02T14:52:06Z

Just a heads up, this issue is for the mps device where the error is expected as we haven't yet checked qwen2 inference on mps. In case you are experiencing issue on 'cuda' or 'cpu' please open a new issue and report you env via transformers-cli env along with the inputs you used to trigger the error 🤗

Koesn · 2024-12-05T17:26:12Z

Yes, also have this problem when running on vLLM to Qwen2-VL. This error always happened right just doing parallel request.

Edit:
Replaced vLLM to lmdepoy and multiple pictures sent normally with parallel request works.

xf12333 · 2025-01-03T13:33:24Z

发表评论以获取更新。

我在使用 colqwen 时遇到了类似的错误。

from colpali_engine.models import ColQwen2, ColQwen2Processor
from colpali_engine.utils.torch_utils import get_torch_device
from transformers.models.qwen2_vl import Qwen2VLForConditionalGeneration, Qwen2VLProcessor

2024-12-02 14:21:56,934 - ERROR - Error processing documents_img/image.png:
Image features and image tokens do not match: tokens: 0, features 2160
Traceback (most recent call last):
  File "/content/vision-rag/vision-rag/colqwen.py", line 146, in process_single_document
    outputs = self.model(**inputs, output_hidden_states=True)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/vision-rag/vision-rag/colqwen.py", line 47, in forward
    return Qwen2VLForConditionalGeneration.forward(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1690, in forward
    raise ValueError(
ValueError: Image features and image tokens do not match: tokens: 0, features 2160

/content/vision-rag# pip show transformers colpali-engine
Name: transformersg# clear
Version: 4.46.2-rag# pip show transformers colpali-eng
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: colpali_engine, peft, sentence-transformers
---
Name: colpali_engine
Version: 0.3.4
Summary: The code used to train and run inference with the ColPali architecture.
Home-page: https://github.com/illuin-tech/colpali
Author: 
Author-email: Manuel Faysse <manuel.faysse@illuin.tech>, Hugues Sibille <hugues.sibille@illuin.tech>, Tony Wu <tony.wu@illuin.tech>
License: 
Location: /usr/local/lib/python3.10/dist-packages
Requires: gputil, numpy, peft, pillow, requests, torch, transformers
Required-by:

您好，请问您解决了吗。我也遇到了类似的错误。

amyer · 2025-01-03T13:33:58Z

这是来自QQ邮箱的假期自动回复邮件。您好，我最近正在休假中，无法亲自回复您的邮件。我将在假期结束后，尽快给您回复。

github-actions · 2025-01-28T08:10:00Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

amyer · 2025-01-28T08:10:30Z

这是来自QQ邮箱的假期自动回复邮件。您好，我最近正在休假中，无法亲自回复您的邮件。我将在假期结束后，尽快给您回复。

toondata added the bug label Sep 10, 2024

LysandreJik added Multimodal Cache labels Sep 10, 2024

zucchini-nlp self-assigned this Sep 11, 2024

zucchini-nlp mentioned this issue Nov 25, 2024

LLaVA-OneVision mismatch between image features and image tokens #34625

Closed

4 tasks

huggingface deleted a comment from code30x58 Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct #33399

The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct #33399

toondata commented Sep 10, 2024

amyer commented Sep 10, 2024 via email

LysandreJik commented Sep 10, 2024

toondata commented Sep 10, 2024

zucchini-nlp commented Sep 10, 2024

toondata commented Sep 10, 2024

toondata commented Sep 10, 2024

zucchini-nlp commented Sep 10, 2024

toondata commented Sep 11, 2024

zucchini-nlp commented Sep 11, 2024

toondata commented Sep 11, 2024

smallsmallwood commented Sep 20, 2024

github-actions bot commented Oct 14, 2024

amyer commented Oct 14, 2024 via email

mano3-1 commented Nov 2, 2024

amyer commented Nov 2, 2024 via email

anshkumar commented Nov 22, 2024 •

edited

Loading

amyer commented Nov 22, 2024 via email

anshkumar commented Nov 23, 2024 •

edited

Loading

knoel99 commented Dec 2, 2024

zucchini-nlp commented Dec 2, 2024

Koesn commented Dec 5, 2024 •

edited

Loading

xf12333 commented Jan 3, 2025

amyer commented Jan 3, 2025 via email

github-actions bot commented Jan 28, 2025

amyer commented Jan 28, 2025 via email

The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct #33399

The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct #33399

Comments

toondata commented Sep 10, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyer commented Sep 10, 2024 via email

LysandreJik commented Sep 10, 2024

toondata commented Sep 10, 2024

zucchini-nlp commented Sep 10, 2024

toondata commented Sep 10, 2024

toondata commented Sep 10, 2024

zucchini-nlp commented Sep 10, 2024

toondata commented Sep 11, 2024

zucchini-nlp commented Sep 11, 2024

toondata commented Sep 11, 2024

smallsmallwood commented Sep 20, 2024

github-actions bot commented Oct 14, 2024

amyer commented Oct 14, 2024 via email

mano3-1 commented Nov 2, 2024

amyer commented Nov 2, 2024 via email

anshkumar commented Nov 22, 2024 • edited Loading

amyer commented Nov 22, 2024 via email

anshkumar commented Nov 23, 2024 • edited Loading

knoel99 commented Dec 2, 2024

zucchini-nlp commented Dec 2, 2024

Koesn commented Dec 5, 2024 • edited Loading

xf12333 commented Jan 3, 2025

amyer commented Jan 3, 2025 via email

github-actions bot commented Jan 28, 2025

amyer commented Jan 28, 2025 via email

anshkumar commented Nov 22, 2024 •

edited

Loading

anshkumar commented Nov 23, 2024 •

edited

Loading

Koesn commented Dec 5, 2024 •

edited

Loading