Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of ImageProcessorFast #35069

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
2f00f0c
add init and base image processing functions
yonigozlan Dec 3, 2024
cfadb72
add add_fast_image_processor to transformers-cli
yonigozlan Dec 3, 2024
2cd73cb
add working fast image processor clip
yonigozlan Dec 3, 2024
932bd68
add fast image processor to doc, working tests
yonigozlan Dec 4, 2024
23d79ce
remove "to be implemented" SigLip
yonigozlan Dec 4, 2024
3f2d8a6
fix unprotected import
yonigozlan Dec 4, 2024
6a9d332
fix unprotected vision import
yonigozlan Dec 4, 2024
a1e2663
update ViTImageProcessorFast
yonigozlan Dec 4, 2024
fa74e7e
increase threshold slow fast ewuivalence
yonigozlan Dec 4, 2024
9dbd765
add fast img blip
yonigozlan Dec 4, 2024
d39ff52
add fast class in tests with cli
yonigozlan Dec 4, 2024
f609730
improve cli
yonigozlan Dec 5, 2024
8f7774d
add fast image processor convnext
yonigozlan Dec 6, 2024
809e1f0
add LlavaPatchingMixin and fast image processor for llava_next and ll…
yonigozlan Dec 7, 2024
f6e6cc2
add device kwarg to ImagesKwargs for fast processing on cuda
yonigozlan Dec 9, 2024
e1ce148
cleanup
yonigozlan Dec 9, 2024
a24d89c
fix unprotected import
yonigozlan Dec 9, 2024
522e200
group images by sizes and add batch processing
yonigozlan Dec 11, 2024
deefc5a
Add batch equivalence tests, skip when center_crop is used
yonigozlan Dec 11, 2024
6a2478e
cleanup
yonigozlan Dec 11, 2024
7d76305
update init and cli
yonigozlan Dec 11, 2024
142ed25
fix-copies
yonigozlan Dec 11, 2024
75bf56f
refactor convnext, cleanup base
yonigozlan Dec 16, 2024
de1fa18
fix
yonigozlan Dec 16, 2024
2ffc41d
remove patching mixins, add piped torchvision transforms for ViT
yonigozlan Dec 17, 2024
b524406
fix unbatched processing
yonigozlan Dec 17, 2024
9c2e2a4
fix f strings
yonigozlan Dec 17, 2024
8c773e0
protect imports
yonigozlan Dec 17, 2024
90fceba
change llava onevision to class transforms (test)
yonigozlan Dec 18, 2024
e878bdd
fix convnext
yonigozlan Dec 18, 2024
57acb7e
improve formatting (following Pavel review)
yonigozlan Jan 6, 2025
2a25104
fix handling device arg
yonigozlan Jan 6, 2025
4784fc8
improve cli
yonigozlan Jan 6, 2025
3ccd291
fix
yonigozlan Jan 6, 2025
053cdcb
fix inits
yonigozlan Jan 16, 2025
1b45e6e
Merge remote-tracking branch 'upstream/main' into improve-fast-image-…
yonigozlan Jan 21, 2025
9246945
Add distinction between preprocess and _preprocess, and support for a…
yonigozlan Jan 21, 2025
6ccd230
uniformize qwen2_vl fast
yonigozlan Jan 22, 2025
c4b8389
fix docstrings
yonigozlan Jan 22, 2025
e5c1e01
add add fast image processor llava
yonigozlan Jan 22, 2025
aef2fb4
remove min_pixels max_pixels from accepted size
yonigozlan Jan 22, 2025
7078a14
nit
yonigozlan Jan 22, 2025
aa94873
nit
yonigozlan Jan 22, 2025
13a125b
refactor fast image processors docstrings
yonigozlan Jan 28, 2025
8adb893
Merge remote-tracking branch 'upstream/main' into improve-fast-image-…
yonigozlan Jan 28, 2025
67d65f2
cleanup and remove fast class transforms
yonigozlan Jan 28, 2025
d225448
update add fast image processor transformers cli
yonigozlan Jan 28, 2025
80c6824
cleanup docstring
yonigozlan Jan 28, 2025
b96adfa
Merge remote-tracking branch 'upstream/main' into improve-fast-image-…
yonigozlan Jan 30, 2025
dbaacd1
uniformize pixtral fast and make _process_image explicit
yonigozlan Jan 30, 2025
b660e9d
Merge remote-tracking branch 'upstream/main' into improve-fast-image-…
yonigozlan Jan 30, 2025
b43ede1
fix prepare image structure llava next/onevision
yonigozlan Jan 30, 2025
3b05cbd
Use typed kwargs instead of explicit args
yonigozlan Feb 4, 2025
95db4a9
nit fix import Unpack
yonigozlan Feb 4, 2025
d9e1fcd
Merge branch 'main' into improve-fast-image-processor-base
yonigozlan Feb 4, 2025
6bd7a1b
clearly separate pops and gets in base preprocess. Use explicit typed…
yonigozlan Feb 4, 2025
565e482
Merge branch 'main' into improve-fast-image-processor-base
yonigozlan Feb 4, 2025
f85c06f
make qwen2_vl preprocess arguments hashable
yonigozlan Feb 4, 2025
1a7b0c4
Merge branch 'improve-fast-image-processor-base' of https://github.co…
yonigozlan Feb 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/blip.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@ The original code can be found [here](https://github.com/salesforce/BLIP).
[[autodoc]] BlipImageProcessor
- preprocess

## BlipImageProcessorFast

[[autodoc]] BlipImageProcessorFast
- preprocess

<frameworkcontent>
<pt>

Expand Down
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/clip.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,11 @@ The resource should ideally demonstrate something new instead of duplicating an
[[autodoc]] CLIPImageProcessor
- preprocess

## CLIPImageProcessorFast

[[autodoc]] CLIPImageProcessorFast
- preprocess

## CLIPFeatureExtractor

[[autodoc]] CLIPFeatureExtractor
Expand Down
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/convnext.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,11 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] ConvNextImageProcessor
- preprocess

## ConvNextImageProcessorFast

[[autodoc]] ConvNextImageProcessorFast
- preprocess

<frameworkcontent>
<pt>

Expand Down
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/deit.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,11 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] DeiTImageProcessor
- preprocess

## DeiTImageProcessorFast

[[autodoc]] DeiTImageProcessorFast
- preprocess

<frameworkcontent>
<pt>

Expand Down
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/llava.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,11 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] LlavaImageProcessor
- preprocess

## LlavaImageProcessorFast

[[autodoc]] LlavaImageProcessorFast
- preprocess

## LlavaProcessor

[[autodoc]] LlavaProcessor
Expand Down
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/llava_next.md
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,11 @@ model = AutoModelForImageTextToText.from_pretrained(
[[autodoc]] LlavaNextImageProcessor
- preprocess

## LlavaNextImageProcessorFast

[[autodoc]] LlavaNextImageProcessorFast
- preprocess

## LlavaNextProcessor

[[autodoc]] LlavaNextProcessor
Expand Down
13 changes: 9 additions & 4 deletions docs/source/en/model_doc/llava_onevision.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ import torch
from PIL import Image
import requests

processor = AutoProcessor.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf")
model = LlavaOnevisionForConditionalGeneration.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf", torch_dtype=torch.float16, low_cpu_mem_usage=True)
processor = AutoProcessor.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf")
model = LlavaOnevisionForConditionalGeneration.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf", torch_dtype=torch.float16, low_cpu_mem_usage=True)
model.to("cuda:0")

# prepare image and text prompt, using the appropriate prompt template
Expand Down Expand Up @@ -298,8 +298,8 @@ First make sure to install flash-attn. Refer to the [original repository of Flas
from transformers import LlavaOnevisionForConditionalGeneration

model = LlavaOnevisionForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16,
model_id,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
use_flash_attention_2=True
).to(0)
Expand All @@ -318,6 +318,11 @@ model = LlavaOnevisionForConditionalGeneration.from_pretrained(

[[autodoc]] LlavaOnevisionImageProcessor

## LlavaOnevisionImageProcessorFast

[[autodoc]] LlavaOnevisionImageProcessorFast
- preprocess

## LlavaOnevisionVideoProcessor

[[autodoc]] LlavaOnevisionVideoProcessor
Expand Down
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/siglip.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,11 @@ Below is an expected speedup diagram that compares inference time between the na
[[autodoc]] SiglipImageProcessor
- preprocess

## SiglipImageProcessorFast

[[autodoc]] SiglipImageProcessorFast
- preprocess

## SiglipProcessor

[[autodoc]] SiglipProcessor
Expand Down
5 changes: 5 additions & 0 deletions docs/source/ja/model_doc/blip.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@ BLIP は、次のようなさまざまなマルチモーダル タスクを実
[[autodoc]] BlipImageProcessor
- preprocess

## BlipImageProcessorFast

[[autodoc]] BlipImageProcessorFast
- preprocess

<frameworkcontent>
<pt>

Expand Down
5 changes: 5 additions & 0 deletions docs/source/ja/model_doc/clip.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,11 @@ CLIP を使い始めるのに役立つ公式 Hugging Face およびコミュニ
[[autodoc]] CLIPImageProcessor
- preprocess

## CLIPImageProcessorFast

[[autodoc]] CLIPImageProcessorFast
- preprocess

## CLIPFeatureExtractor

[[autodoc]] CLIPFeatureExtractor
Expand Down
5 changes: 5 additions & 0 deletions docs/source/ja/model_doc/convnext.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,11 @@ ConvNeXT の使用を開始するのに役立つ公式 Hugging Face およびコ
[[autodoc]] ConvNextImageProcessor
- preprocess

## ConvNextImageProcessorFast

[[autodoc]] ConvNextImageProcessorFast
- preprocess

<frameworkcontent>
<pt>

Expand Down
5 changes: 5 additions & 0 deletions docs/source/ja/model_doc/deit.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,11 @@ DeiT を始めるのに役立つ公式 Hugging Face およびコミュニティ
[[autodoc]] DeiTImageProcessor
- preprocess

## DeiTImageProcessorFast

[[autodoc]] DeiTImageProcessorFast
- preprocess

<frameworkcontent>
<pt>

Expand Down
5 changes: 1 addition & 4 deletions examples/modular-transformers/modeling_new_task_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -452,10 +452,7 @@ def prepare_inputs_for_generation(
return model_inputs

def resize_token_embeddings(
self,
new_num_tokens: Optional[int] = None,
pad_to_multiple_of=None,
mean_resizing=True
self, new_num_tokens: Optional[int] = None, pad_to_multiple_of=None, mean_resizing=True
) -> nn.Embedding:
model_embeds = self.language_model.resize_token_embeddings(new_num_tokens, pad_to_multiple_of, mean_resizing)

Expand Down
5 changes: 1 addition & 4 deletions examples/modular-transformers/modular_new_task_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,7 @@ def forward(
return (embeddings,) + vlm_outputs

def resize_token_embeddings(
self,
new_num_tokens: Optional[int] = None,
pad_to_multiple_of=None,
mean_resizing=True
self, new_num_tokens: Optional[int] = None, pad_to_multiple_of=None, mean_resizing=True
) -> nn.Embedding:
model_embeds = self.language_model.resize_token_embeddings(new_num_tokens, pad_to_multiple_of, mean_resizing)

Expand Down
16 changes: 16 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1308,11 +1308,19 @@
]
else:
_import_structure["image_processing_utils_fast"] = ["BaseImageProcessorFast"]
_import_structure["models.blip"].append("BlipImageProcessorFast")
_import_structure["models.clip"].append("CLIPImageProcessorFast")
_import_structure["models.convnext"].append("ConvNextImageProcessorFast")
_import_structure["models.deformable_detr"].append("DeformableDetrImageProcessorFast")
_import_structure["models.deit"].append("DeiTImageProcessorFast")
_import_structure["models.detr"].append("DetrImageProcessorFast")
_import_structure["models.llava"].append("LlavaImageProcessorFast")
_import_structure["models.llava_next"].append("LlavaNextImageProcessorFast")
_import_structure["models.llava_onevision"].append("LlavaOnevisionImageProcessorFast")
_import_structure["models.pixtral"].append("PixtralImageProcessorFast")
_import_structure["models.qwen2_vl"].append("Qwen2VLImageProcessorFast")
_import_structure["models.rt_detr"].append("RTDetrImageProcessorFast")
_import_structure["models.siglip"].append("SiglipImageProcessorFast")
_import_structure["models.vit"].append("ViTImageProcessorFast")

try:
Expand Down Expand Up @@ -6442,11 +6450,19 @@
from .utils.dummy_torchvision_objects import *
else:
from .image_processing_utils_fast import BaseImageProcessorFast
from .models.blip import BlipImageProcessorFast
from .models.clip import CLIPImageProcessorFast
from .models.convnext import ConvNextImageProcessorFast
from .models.deformable_detr import DeformableDetrImageProcessorFast
from .models.deit import DeiTImageProcessorFast
from .models.detr import DetrImageProcessorFast
from .models.llava import LlavaImageProcessorFast
from .models.llava_next import LlavaNextImageProcessorFast
from .models.llava_onevision import LlavaOnevisionImageProcessorFast
from .models.pixtral import PixtralImageProcessorFast
from .models.qwen2_vl import Qwen2VLImageProcessorFast
from .models.rt_detr import RTDetrImageProcessorFast
from .models.siglip import SiglipImageProcessorFast
from .models.vit import ViTImageProcessorFast

try:
Expand Down
Loading