fix(deps): update dependency transformers to v4.42.3 #1209
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
4.41.1
->4.42.3
Release Notes
huggingface/transformers (transformers)
v4.42.3
: Patch release v4.42.3Compare Source
Make sure we have attention softcapping for "eager" GEMMA2 model
After experimenting, we noticed that for the 27b model mostly, softcapping is a must. So adding it back (it should have been there, but an error on my side made it disappear) sorry all! 😭
v4.42.2
: Patch release v4.42.2Compare Source
Patch release
Thanks to our 2 contributors for their prompt fixing mostly applies for training and FA2!
v4.42.1
: : Patch releaseCompare Source
Patch release for commit:
v4.42.0
: : Gemma 2, RTDETR, InstructBLIP, LLAVa Next, New Model AdderCompare Source
New model additions
Gemma-2
The Gemma2 model was proposed in Gemma2: Open Models Based on Gemini Technology and Research by Gemma2 Team, Google.
Gemma2 models are trained on 6T tokens, and released with 2 versions, 2b and 7b.
The abstract from the paper is the following:
This work introduces Gemma2, a new family of open language models demonstrating strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma2 outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of our model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations
RTDETR
The RT-DETR model was proposed in DETRs Beat YOLOs on Real-time Object Detection by Wenyu Lv, Yian Zhao, Shangliang Xu, Jinman Wei, Guanzhong Wang, Cheng Cui, Yuning Du, Qingqing Dang, Yi Liu.
RT-DETR is an object detection model that stands for “Real-Time DEtection Transformer.” This model is designed to perform object detection tasks with a focus on achieving real-time performance while maintaining high accuracy. Leveraging the transformer architecture, which has gained significant popularity in various fields of deep learning, RT-DETR processes images to identify and locate multiple objects within them.
InstructBlip
The InstructBLIP model was proposed in InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the BLIP-2 architecture for visual instruction tuning.
InstructBLIP uses the same architecture as BLIP-2 with a tiny but important difference: it also feeds the text prompt (instruction) to the Q-Former.
LlaVa NeXT Video
The LLaVa-NeXT-Video model was proposed in LLaVA-NeXT: A Strong Zero-shot Video Understanding Model by Yuanhan Zhang, Bo Li, Haotian Liu, Yong Jae Lee, Liangke Gui, Di Fu, Jiashi Feng, Ziwei Liu, Chunyuan Li. LLaVa-NeXT-Video improves upon LLaVa-NeXT by fine-tuning on a mix if video and image dataset thus increasing the model’s performance on videos.
LLaVA-NeXT surprisingly has strong performance in understanding video content in zero-shot fashion with the AnyRes technique that it uses. The AnyRes technique naturally represents a high-resolution image into multiple images. This technique is naturally generalizable to represent videos because videos can be considered as a set of frames (similar to a set of images in LLaVa-NeXT). The current version of LLaVA-NeXT makes use of AnyRes and trains with supervised fine-tuning (SFT) on top of LLaVA-Next on video data to achieves better video understanding capabilities.The model is a current SOTA among open-source models on VideoMME bench.
New model adder
A very significant change makes its way within the
transformers
codebase, introducing a new way to add models totransformers
. We recommend reading the description of the PR below, but here is the gist of it:Tool-use and RAG model support
We've made major updates to our support for tool-use and RAG models. We can now automatically generate JSON schema descriptions for Python functions which are suitable for passing to tool models, and we've defined a standard API for tool models which should allow the same tool inputs to be used with many different models. Models will need updates to their chat templates to support the new API, and we're targeting the Nous-Hermes, Command-R and Mistral/Mixtral model families for support in the very near future. Please see the updated chat template docs for more information.
If you are the owner of a model that supports tool use, but you're not sure how to update its chat template to support the new API, feel free to reach out to us for assistance with the update, for example on the Hugging Face Discord server. Ping Matt and yell key phrases like "chat templates" and "Jinja" and your issue will probably get resolved.
GGUF support
We further the support of GGUF files to offer fine-tuning within the python/HF ecosystem, before converting them back to the GGUF/GGML/llama.cpp libraries.
Trainer improvements
A new optimizer is added in the
Trainer
.Quantization improvements
Several improvements are done related to quantization: a new cache (the quantized KV cache) is added, offering the ability to convert the cache of generative models, further reducing the memory requirements.
Additionally, the documentation related to quantization is entirely redone with the aim of helping users choose which is the best quantization method.
Examples
New instance segmentation examples are added by @qubvel
Notable improvements
As a notable improvement to the HF vision models that leverage backbones, we enable leveraging HF pretrained model weights as backbones, with the following API:
Additionally, we thank @Cyrilvallez for diving into our
generate
method and greatly reducing the memory requirements.generate()
🔥🔥🔥 by @Cyrilvallez in #30536Breaking changes
Remove ConversationalPipeline and Conversation object
Both the ConversationalPipeline and the Conversation object have been deprecated for a while, and are due for removal in 4.42, which is the upcoming version.
The
TextGenerationPipeline
is recommended for this use-case, and now accepts inputs in the form of the OpenAI API.Remove an accidental duplicate softmax application in FLAVA's attention
Removes duplicate softmax application in FLAVA attention. Likely to have a small change on the outputs but flagging with 🚨 as it will change a bit.
Idefics2's
ignore_index
attribute of the loss is updated to-100
out_indices from
timm
being updatedRecent updates to timm changed the type of the attribute
model.feature_info.out_indices
. Previously,out_indices
would reflect the input type ofout_indices
on thecreate_model
call i.e. eithertuple
orlist
. Now, this value is always a tuple.As list are more useful and consistent for us -- we cannot save tuples in configs, they must be converted to lists first -- we instead choose to cast
out_indices
to always be a list.This has the possibility of being a slight breaking change if users are creating models and relying on
out_indices
on being a tuple. As this property only happens when a new model is created, and not if it's saved and reloaded (because of the config), then I think this has a low chance of having much of an impact.datasets referenced in the quantization config get updated to remove references to datasets with restrictive licenses.
Bugfixes and improvements
mamba
slow forward by @vasqu in #30691tokenizer_class = "AutoTokenizer"
Llava Family by @ArthurZucker in #30912optimum-benchmark
by @ydshieh in #30615torch.use_deterministic_algorithms
for XPU by @faaany in #30774MptIntegrationTests
expected outputs by @ydshieh in #30989uv==0.1.45
by @ydshieh in #31006test_model_parallelism
device-agnostic by @faaany in #30844test_model_parallelism
for 2 model test classes by @ydshieh in #31067@main
by @ydshieh in #31065ninja
from docker image build by @ydshieh in #31080accelerate
as a hard requirement by @younesbelkada in #31090OPTForQuestionAnswering
by @younesbelkada in #31092test_multi_gpu_data_parallel_forward
forvit
anddeit
by @ydshieh in #31086HF_HUB_OFFLINE
+ fix has_file in offline mode by @Wauplin in #31016transformers-cli env
reporting by @statelesshz in #31003load_in_8bit
with bnb config by @younesbelkada in #31136IS_GITHUB_CI
by @younesbelkada in #31147GemmaModel
] fix small typo by @ArthurZucker in #31202test_compile_static_cache
by @ydshieh in #30991mistral.py::Mask4DTestHard
by @ydshieh in #31212MistralIntegrationTest
by @ydshieh in #31231BlipModel
by @younesbelkada in #31235name 'torch' is not defined
inbitsandbytes
integration by @jamesbraza in #31243benchmark
job inpush-important-models.yml
by @ydshieh in #31259SwitchTransformer
] Significant performance improvement on MoE blocks by @ranggihwang in #31173cached_download
tohf_hub_download
in remaining occurrences by @Wauplin in #31284str
should be used notint
when setting env variables by @statelesshz in #31272decoder_attention_mask
shape by @ylacombe in #28071inputs_embeds
paddinglogger.warning
tologger.warning_once
by @naimenz in #31411tokenizer
being popped twice by @gante in #31427TestDeepSpeedModelZoo
device-agnostic by @faaany in #31402dataloader_persistent_workers=True
by @bastienlc in #30627Qwen2ForTokenClassification
by @kevinhu in #31440generate
call from local path by @gante in #31470PreTrainedTokenizerFast
loading time when there are many added tokens by @ydshieh in #31404metric_for_best_model
errors by @tomaarsen in #31450GPT2
] Add SDPA support by @vasqu in #31172test_config_object
totest_ds_config_object
by @faaany in [#31403](https://togithub.com/huggingface/tConfiguration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Mend Renovate. View repository job log here.