PixtralProcessor always returns outputs of length 1 #34204

Infernaught · 2024-10-16T22:54:22Z

System Info

transformers version: 4.45.2
Platform: Linux-5.4.0-1113-oracle-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.23.2
Safetensors version: 0.4.5
Accelerate version: 0.34.2
Accelerate config: not found
PyTorch version (GPU?): 2.2.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: False
Using GPU in script?: True
GPU type: NVIDIA RTX A5000

Who can help?

@ArthurZucker @amyeroberts Hello! I'm working on finetuning a VLM, and during my dataset preprocessing, I'm noticing that the PixtralProcessor always returns the input ids for only the first example in a batch. I think this is being caused by the lines here, which are aggregating all of the images into a length 1 list of lists, which is messing with the iteration over the zip here. Is this unintended behavior or am I doing something wrong?

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

processor = AutoProcessor.from_pretrained("mistral-community/pixtral-12b")

# Define prompts as a list of 50 text prompts and images as a list of 50 decoded images
batch = processor(prompts, images, padding=True, return_tensors="pt") # Returns a batch containing only the first of the 50 elements

Expected behavior

I would expect it to return the outputs corresponding to all 50 prompts and images.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-10-22T13:54:37Z

Ah this is not expected, would you like to open a PR for a fix? 🤗

Infernaught · 2024-10-22T16:13:50Z

Sure let me give it a shot

Fixes huggingface#34204 Update `PixtralProcessor` to handle batches of images and text prompts correctly. * Modify the `__call__` method in `src/transformers/models/pixtral/processing_pixtral.py` to process each example in a batch individually. * Update the handling of images to correctly iterate over the zip of images, image sizes, and text. * Add a test case in `tests/models/pixtral/test_processor_pixtral.py` to verify the `PixtralProcessor` returns the outputs corresponding to all prompts and images in a batch. * Ensure the test case includes multiple images and text prompts in a batch and verifies the outputs match the expected outputs for all examples in the batch. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/huggingface/transformers/issues/34204?shareId=XXXX-XXXX-XXXX-XXXX).

…n batch Related to huggingface#34204 Update `PixtralProcessor` to handle batches of images and text prompts correctly. * **`src/transformers/models/pixtral/processing_pixtral.py`** - Modify the handling of images to correctly iterate over the zip of images, image sizes, and text. - Remove the aggregation of all images into a length 1 list of lists. - Ensure the `PixtralProcessor` processes each example in a batch individually. * **`tests/models/pixtral/test_processor_pixtral.py`** - Add a test case to verify the `PixtralProcessor` returns the outputs corresponding to all prompts and images in a batch. - Ensure the test case includes multiple images and text prompts in a batch. - Verify the outputs of the `PixtralProcessor` match the expected outputs for all examples in the batch. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/huggingface/transformers/issues/34204?shareId=XXXX-XXXX-XXXX-XXXX).

Infernaught added the bug label Oct 16, 2024

LysandreJik added Vision Multimodal labels Oct 17, 2024

ArthurZucker added the Good First Issue label Oct 22, 2024

Ryukijano mentioned this issue Oct 22, 2024

Fix PixtralProcessor to return outputs for all examples in a batch #34321

Open

molbap mentioned this issue Oct 29, 2024

fix pixtral processor #34486

Merged

Ryukijano mentioned this issue Oct 29, 2024

Fix PixtralProcessor to return input IDs for all prompts and images in batch #34491

Open

molbap closed this as completed in #34486 Oct 30, 2024

zucchini-nlp mentioned this issue Dec 5, 2024

VLMs: fix number of image tokens #34332

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PixtralProcessor always returns outputs of length 1 #34204

PixtralProcessor always returns outputs of length 1 #34204

Infernaught commented Oct 16, 2024 •

edited

Loading

ArthurZucker commented Oct 22, 2024

Infernaught commented Oct 22, 2024

PixtralProcessor always returns outputs of length 1 #34204

PixtralProcessor always returns outputs of length 1 #34204

Comments

Infernaught commented Oct 16, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Oct 22, 2024

Infernaught commented Oct 22, 2024

Infernaught commented Oct 16, 2024 •

edited

Loading