Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PixtralProcessor always returns outputs of length 1 #34204

Closed
2 of 4 tasks
Infernaught opened this issue Oct 16, 2024 · 2 comments · Fixed by #34486 · May be fixed by #34321
Closed
2 of 4 tasks

PixtralProcessor always returns outputs of length 1 #34204

Infernaught opened this issue Oct 16, 2024 · 2 comments · Fixed by #34486 · May be fixed by #34321

Comments

@Infernaught
Copy link

Infernaught commented Oct 16, 2024

System Info

  • transformers version: 4.45.2
  • Platform: Linux-5.4.0-1113-oracle-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.23.2
  • Safetensors version: 0.4.5
  • Accelerate version: 0.34.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: False
  • Using GPU in script?: True
  • GPU type: NVIDIA RTX A5000

Who can help?

@ArthurZucker @amyeroberts Hello! I'm working on finetuning a VLM, and during my dataset preprocessing, I'm noticing that the PixtralProcessor always returns the input ids for only the first example in a batch. I think this is being caused by the lines here, which are aggregating all of the images into a length 1 list of lists, which is messing with the iteration over the zip here. Is this unintended behavior or am I doing something wrong?

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

processor = AutoProcessor.from_pretrained("mistral-community/pixtral-12b")

# Define prompts as a list of 50 text prompts and images as a list of 50 decoded images
batch = processor(prompts, images, padding=True, return_tensors="pt") # Returns a batch containing only the first of the 50 elements

Expected behavior

I would expect it to return the outputs corresponding to all 50 prompts and images.

@ArthurZucker
Copy link
Collaborator

Ah this is not expected, would you like to open a PR for a fix? 🤗

@Infernaught
Copy link
Author

Sure let me give it a shot

Ryukijano added a commit to Ryukijano/transformers that referenced this issue Oct 22, 2024
Fixes huggingface#34204

Update `PixtralProcessor` to handle batches of images and text prompts correctly.

* Modify the `__call__` method in `src/transformers/models/pixtral/processing_pixtral.py` to process each example in a batch individually.
* Update the handling of images to correctly iterate over the zip of images, image sizes, and text.
* Add a test case in `tests/models/pixtral/test_processor_pixtral.py` to verify the `PixtralProcessor` returns the outputs corresponding to all prompts and images in a batch.
* Ensure the test case includes multiple images and text prompts in a batch and verifies the outputs match the expected outputs for all examples in the batch.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/huggingface/transformers/issues/34204?shareId=XXXX-XXXX-XXXX-XXXX).
Ryukijano added a commit to Ryukijano/transformers that referenced this issue Oct 29, 2024
…n batch

Related to huggingface#34204

Update `PixtralProcessor` to handle batches of images and text prompts correctly.

* **`src/transformers/models/pixtral/processing_pixtral.py`**
  - Modify the handling of images to correctly iterate over the zip of images, image sizes, and text.
  - Remove the aggregation of all images into a length 1 list of lists.
  - Ensure the `PixtralProcessor` processes each example in a batch individually.

* **`tests/models/pixtral/test_processor_pixtral.py`**
  - Add a test case to verify the `PixtralProcessor` returns the outputs corresponding to all prompts and images in a batch.
  - Ensure the test case includes multiple images and text prompts in a batch.
  - Verify the outputs of the `PixtralProcessor` match the expected outputs for all examples in the batch.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/huggingface/transformers/issues/34204?shareId=XXXX-XXXX-XXXX-XXXX).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants