Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct #33399

Open
3 of 4 tasks
toondata opened this issue Sep 10, 2024 · 25 comments
Open
3 of 4 tasks

Comments

@toondata
Copy link

System Info

  • transformers version: 4.45.0.dev0
  • Platform: macOS-14.6.1-arm64-arm-64bit
  • Python version: 3.12.4
  • Huggingface_hub version: 0.24.6
  • Safetensors version: 0.4.5
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.1 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:

Who can help?

@zucchini-nlp @amyer

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Run this code after git clone with the hash I specified above and pip install ./transformers

from transformers import Qwen2VLForConditionalGeneration,AutoModel,AutoProcessor

model_path=".models/Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(
             model_path,
             torch_dtype=torch.bfloat16,
             #attn_implementation="default"
        ).to(self.device) #device="mps"
min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_path,
                                                   min_pixels=min_pixels, 
                                                   max_pixels=max_pixels
                                                   )
messages = [
            {
                "role": "user",
                "content": [
                    {
                        "type": "image"
                    },
                    {
                        "type": "text",
                        "text": "Extract text from pdf"
                    }
                ]
            }
        ]
base64_data = image_data.split(',')[1]  # remove 'data:image/jpeg;base64,' 
image_bytes = base64.b64decode(base64_data)
image = Image.open(io.BytesIO(image_bytes))
text = processor.apply_chat_template(
      messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(
      text=[text],
      images=[image],
).to(self.device)#device="mps"

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
       generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
return output_text  # Dummy return

Expected behavior

File "/Users/dev/products/dev/workspaces/mixparse/llm/model/modelmanager.py", line 429, in _run_safetensors_inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/transformers/generation/utils.py", line 2015, in generate
result = self._sample(
^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/transformers/generation/utils.py", line 2965, in _sample
outputs = self(**model_inputs, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
inputs_embeds[image_mask] = image_embeds
RuntimeError: shape mismatch: value tensor of shape [630, 3584] cannot be broadcast to indexing result of shape [0, 3584]

@toondata toondata added the bug label Sep 10, 2024
@amyer
Copy link

amyer commented Sep 10, 2024 via email

@LysandreJik
Copy link
Member

Thanks for the issue @toondata! You mention it's the same issue as #31377; does applying the same fix here solve the issue?

@toondata
Copy link
Author

Sorry, my expression may have caused you misunderstanding. I encountered a problem similar to issue #31377. However, given the differences in implementation logic between idefics2 and Qwen/Qwen2-VL-7B-Instruct, I'm unsure whether the causes of these similar phenomena are the same. Despite downloading and compiling the latest mainline code, the issue remains unresolved.

Thanks for the issue @toondata! You mention it's the same issue as #31377; does applying the same fix here solve the issue?

@zucchini-nlp
Copy link
Member

@toondata can you share the hash pls, I can't find it. But I tried to run your code in the latest main and got no errors, can you double check that updating version helps. Also, I see you have 'mps' commented out. We had several problems with size mismatch in LLaVA so the error might be related to that. If you can get try to run it in 'cuda', maybe in colab notebook, would be helpful to see the source of error

import requests
from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration,AutoModel,AutoProcessor


model_path="Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(model_path, torch_dtype=torch.bfloat16,).to("cuda:0")

min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_path,
                            min_pixels=min_pixels, 
                            max_pixels=max_pixels
                        )
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image"
            },
            {
                "type": "text",
                "text": "Extract text from pdf"
            }
        ]
    }
]

image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
text = processor.apply_chat_template(
      messages, tokenize=False, add_generation_prompt=True
)
inputs = processor( text=[text], images=[image],).to("cuda:0")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
       generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

@toondata
Copy link
Author

Here is my git hash: 96429e7.
I just updated to the latest version on the main branch and reinstalled transformers via pip install, but the result is still the same as before.
I don’t have a CUDA device at hand. My device is a Mac M3 Max, and I commented out “mps” in order to provide you with more accurate information. Could this issue be related to MPS?

@toondata
Copy link
Author

I ran your code and image source with the device changed to MPS, and the issue remains the same, except the tensor that caused the issue has different dimensions.

File "/Users/dev/products/dev/workspaces/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
inputs_embeds[image_mask] = image_embeds
RuntimeError: shape mismatch: value tensor of shape [1247, 3584] cannot be broadcast to indexing result of shape [0, 3584]

@toondata can you share the hash pls, I can't find it. But I tried to run your code in the latest main and got no errors, can you double check that updating version helps. Also, I see you have 'mps' commented out. We had several problems with size mismatch in LLaVA so the error might be related to that. If you can get try to run it in 'cuda', maybe in colab notebook, would be helpful to see the source of error

import requests
from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration,AutoModel,AutoProcessor


model_path="Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(model_path, torch_dtype=torch.bfloat16,).to("cuda:0")

min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_path,
                            min_pixels=min_pixels, 
                            max_pixels=max_pixels
                        )
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image"
            },
            {
                "type": "text",
                "text": "Extract text from pdf"
            }
        ]
    }
]

image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
text = processor.apply_chat_template(
      messages, tokenize=False, add_generation_prompt=True
)
inputs = processor( text=[text], images=[image],).to("cuda:0")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
       generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

@zucchini-nlp
Copy link
Member

maybe #30294 helps, it has a solution that worked for llava with mps

@toondata
Copy link
Author

After looking at #30294, I feel the issue might not be related. I switched my local code to run on the CPU, and the problem is the same as with MPS.

image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
            text = processor.apply_chat_template(
                messages, tokenize=False, add_generation_prompt=True
            )
            inputs = processor( text=[text], images=[image],).to("cpu")

            # Inference: Generation of the output
            generated_ids = model.generate(**inputs, max_new_tokens=128)

File "/Users/dev/products/dev/workspaces/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
inputs_embeds[image_mask] = image_embeds
RuntimeError: shape mismatch: value tensor of shape [1247, 3584] cannot be broadcast to indexing result of shape [0, 3584]

@zucchini-nlp
Copy link
Member

I could also get a colab notebook working with the script, and the error on cpu also might happen as per the linked issue.

Let me see if I can get an mps to reproduce it, will need some time to dig

@zucchini-nlp zucchini-nlp self-assigned this Sep 11, 2024
@toondata
Copy link
Author

Thank you very much, looking forward to the results of your digings.

@smallsmallwood
Copy link

met the same question

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@amyer
Copy link

amyer commented Oct 14, 2024 via email

@mano3-1
Copy link

mano3-1 commented Nov 2, 2024

hey @toondata ,
I am facing similar issue while finetuning the model. Were you able to fix it?

@amyer
Copy link

amyer commented Nov 2, 2024 via email

@anshkumar
Copy link

anshkumar commented Nov 22, 2024

Getting same error:

Parameter Offload: Total persistent parameters: 686592 in 401 params
{'loss': 4.9398, 'grad_norm': 243.39051064272118, 'learning_rate': 1.4005602240896359e-08, 'epoch': 0.0}                                                  
  0%|                                                                                                                | 1/23800 [00:03<25:10:16,  3.81s/it][rank1]: Traceback (most recent call last):
[rank1]:   File "/home/sort/ved/sort/apple_l2/Qwen2-VL-Finetune/src/training/train.py", line 209, in <module>
[rank1]:     train()
[rank1]:   File "/home/sort/ved/sort/apple_l2/Qwen2-VL-Finetune/src/training/train.py", line 184, in train
[rank1]:     trainer.train()
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2122, in train
[rank1]:     return inner_training_loop(
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2474, in _inner_training_loop
[rank1]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3572, in training_step
[rank1]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3625, in compute_loss
[rank1]:     outputs = model(**inputs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank1]:     ret_val = func(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
[rank1]:     loss = self.module(*inputs, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank1]:     return inner()
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank1]:     result = forward_call(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/liger_kernel/transformers/model/qwen2_vl.py", line 109, in lce_forward
[rank1]:     inputs_embeds[image_mask] = image_embeds
[rank1]: RuntimeError: shape mismatch: value tensor of shape [49, 1536] cannot be broadcast to indexing result of shape [0, 1536]
W1122 16:10:57.817000 79821 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 79875 closing signal SIGTERM
E1122 16:10:58.032000 79821 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 79876) of binary: /home/sort/miniconda3/envs/qwen2/bin/python
Traceback (most recent call last):
  File "/home/sort/miniconda3/envs/qwen2/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1165, in launch_command
    multi_gpu_launcher(args)
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/accelerate/commands/launch.py", line 799, in multi_gpu_launcher
    distrib_run.run(args)
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
    elastic_launch(
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
src/training/train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-11-22_16:10:57
  host      : sort-X570S-AORUS-MASTER
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 79876)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

@amyer
Copy link

amyer commented Nov 22, 2024 via email

@anshkumar
Copy link

anshkumar commented Nov 23, 2024

Getting same error:

Parameter Offload: Total persistent parameters: 686592 in 401 params
{'loss': 4.9398, 'grad_norm': 243.39051064272118, 'learning_rate': 1.4005602240896359e-08, 'epoch': 0.0}                                                  
  0%|                                                                                                                | 1/23800 [00:03<25:10:16,  3.81s/it][rank1]: Traceback (most recent call last):
[rank1]:   File "/home/sort/ved/sort/apple_l2/Qwen2-VL-Finetune/src/training/train.py", line 209, in <module>
[rank1]:     train()
[rank1]:   File "/home/sort/ved/sort/apple_l2/Qwen2-VL-Finetune/src/training/train.py", line 184, in train
[rank1]:     trainer.train()
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2122, in train
[rank1]:     return inner_training_loop(
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2474, in _inner_training_loop
[rank1]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3572, in training_step
[rank1]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3625, in compute_loss
[rank1]:     outputs = model(**inputs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank1]:     ret_val = func(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
[rank1]:     loss = self.module(*inputs, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank1]:     return inner()
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank1]:     result = forward_call(*args, **kwargs)
[rank1]:   File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/liger_kernel/transformers/model/qwen2_vl.py", line 109, in lce_forward
[rank1]:     inputs_embeds[image_mask] = image_embeds
[rank1]: RuntimeError: shape mismatch: value tensor of shape [49, 1536] cannot be broadcast to indexing result of shape [0, 1536]
W1122 16:10:57.817000 79821 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 79875 closing signal SIGTERM
E1122 16:10:58.032000 79821 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 79876) of binary: /home/sort/miniconda3/envs/qwen2/bin/python
Traceback (most recent call last):
  File "/home/sort/miniconda3/envs/qwen2/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1165, in launch_command
    multi_gpu_launcher(args)
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/accelerate/commands/launch.py", line 799, in multi_gpu_launcher
    distrib_run.run(args)
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
    elastic_launch(
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/sort/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
src/training/train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-11-22_16:10:57
  host      : sort-X570S-AORUS-MASTER
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 79876)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

My dataset was missing <image> token. Adding it fixed the issue.

@knoel99
Copy link

knoel99 commented Dec 2, 2024

Commenting to get updates.

I have a similar error using colqwen.

from colpali_engine.models import ColQwen2, ColQwen2Processor
from colpali_engine.utils.torch_utils import get_torch_device
from transformers.models.qwen2_vl import Qwen2VLForConditionalGeneration, Qwen2VLProcessor
2024-12-02 14:21:56,934 - ERROR - Error processing documents_img/image.png:
Image features and image tokens do not match: tokens: 0, features 2160
Traceback (most recent call last):
  File "/content/vision-rag/vision-rag/colqwen.py", line 146, in process_single_document
    outputs = self.model(**inputs, output_hidden_states=True)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/vision-rag/vision-rag/colqwen.py", line 47, in forward
    return Qwen2VLForConditionalGeneration.forward(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1690, in forward
    raise ValueError(
ValueError: Image features and image tokens do not match: tokens: 0, features 2160
/content/vision-rag# pip show transformers colpali-engine
Name: transformersg# clear
Version: 4.46.2-rag# pip show transformers colpali-eng
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: colpali_engine, peft, sentence-transformers
---
Name: colpali_engine
Version: 0.3.4
Summary: The code used to train and run inference with the ColPali architecture.
Home-page: https://github.com/illuin-tech/colpali
Author: 
Author-email: Manuel Faysse <manuel.faysse@illuin.tech>, Hugues Sibille <hugues.sibille@illuin.tech>, Tony Wu <tony.wu@illuin.tech>
License: 
Location: /usr/local/lib/python3.10/dist-packages
Requires: gputil, numpy, peft, pillow, requests, torch, transformers
Required-by: 

@zucchini-nlp
Copy link
Member

Just a heads up, this issue is for the mps device where the error is expected as we haven't yet checked qwen2 inference on mps. In case you are experiencing issue on 'cuda' or 'cpu' please open a new issue and report you env via transformers-cli env along with the inputs you used to trigger the error 🤗

@Koesn
Copy link

Koesn commented Dec 5, 2024

Yes, also have this problem when running on vLLM to Qwen2-VL. This error always happened right just doing parallel request.

Edit:
Replaced vLLM to lmdepoy and multiple pictures sent normally with parallel request works.

@huggingface huggingface deleted a comment from code30x58 Dec 18, 2024
@xf12333
Copy link

xf12333 commented Jan 3, 2025

发表评论以获取更新。

我在使用 colqwen 时遇到了类似的错误。

from colpali_engine.models import ColQwen2, ColQwen2Processor
from colpali_engine.utils.torch_utils import get_torch_device
from transformers.models.qwen2_vl import Qwen2VLForConditionalGeneration, Qwen2VLProcessor
2024-12-02 14:21:56,934 - ERROR - Error processing documents_img/image.png:
Image features and image tokens do not match: tokens: 0, features 2160
Traceback (most recent call last):
  File "/content/vision-rag/vision-rag/colqwen.py", line 146, in process_single_document
    outputs = self.model(**inputs, output_hidden_states=True)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/vision-rag/vision-rag/colqwen.py", line 47, in forward
    return Qwen2VLForConditionalGeneration.forward(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1690, in forward
    raise ValueError(
ValueError: Image features and image tokens do not match: tokens: 0, features 2160
/content/vision-rag# pip show transformers colpali-engine
Name: transformersg# clear
Version: 4.46.2-rag# pip show transformers colpali-eng
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: colpali_engine, peft, sentence-transformers
---
Name: colpali_engine
Version: 0.3.4
Summary: The code used to train and run inference with the ColPali architecture.
Home-page: https://github.com/illuin-tech/colpali
Author: 
Author-email: Manuel Faysse <manuel.faysse@illuin.tech>, Hugues Sibille <hugues.sibille@illuin.tech>, Tony Wu <tony.wu@illuin.tech>
License: 
Location: /usr/local/lib/python3.10/dist-packages
Requires: gputil, numpy, peft, pillow, requests, torch, transformers
Required-by: 

您好,请问您解决了吗。我也遇到了类似的错误。
image

@amyer
Copy link

amyer commented Jan 3, 2025 via email

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@amyer
Copy link

amyer commented Jan 28, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants