Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for Tapas Model #520

Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
7939e13
Added support for Tapas Model
Nov 27, 2022
2db0d45
Merge branch 'huggingface:main' into Add-Bettertransformer-support-fo…
JuheonChu Nov 27, 2022
283186a
Added support for Tapas Model
Nov 27, 2022
b874831
Merge branch 'Add-Bettertransformer-support-for-Tapas' of https://git…
Nov 27, 2022
7c24a42
reformatted files with black
Nov 28, 2022
12d6154
Update tests/bettertransformer/test_bettertransformer_encoder.py
JuheonChu Nov 28, 2022
14b35ca
Update optimum/bettertransformer/models/encoder_models.py
JuheonChu Nov 28, 2022
9632f98
Merge branch 'huggingface:main' into Add-Bettertransformer-support-fo…
JuheonChu Nov 28, 2022
d2e6ebb
Update tests/bettertransformer/test_bettertransformer_encoder.py
JuheonChu Nov 28, 2022
762a804
Styled optimum files
Nov 28, 2022
ba1079e
Merge branch 'Add-Bettertransformer-support-for-Tapas' of https://git…
Nov 28, 2022
4047267
Update optimum/bettertransformer/models/encoder_models.py
JuheonChu Nov 28, 2022
ac716cc
Update optimum/bettertransformer/models/encoder_models.py
JuheonChu Nov 28, 2022
add72dd
Merge branch 'huggingface:main' into Add-Bettertransformer-support-fo…
JuheonChu Nov 28, 2022
bbd625e
Update tests/bettertransformer/test_bettertransformer_encoder.py
JuheonChu Nov 29, 2022
fded9ac
Moved Tapas Encoder model to Encoder
JuheonChu Nov 29, 2022
7fcc616
change mapping in __init_.py
JuheonChu Nov 29, 2022
579f889
deleted
JuheonChu Nov 29, 2022
fe547eb
Update __init__.py
JuheonChu Nov 29, 2022
1d64709
Update __init__.py
JuheonChu Nov 29, 2022
64e644c
refactor doc + remove class + styling
younesbelkada Nov 30, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 25 additions & 8 deletions docs/source/bettertransformer/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ specific language governing permissions and limitations under the License.

🤗 Optimum provides an integration with `BetterTransformer`, a stable API from PyTorch to benefit from interesting speedups on CPU & GPU through sparsity and fused kernels.


## Quickstart

Since its 1.13 version, PyTorch released the stable version of `BetterTransformer` in its library. You can benefit from interesting speedup on most consumer-type devices, including CPUs, older and newer versions of NIVIDIA GPUs.
Expand All @@ -23,6 +22,7 @@ You can now use this feature in 🤗 Optimum together with Transformers and use
### Supported models

The list of supported model below:

- [AlBERT](https://arxiv.org/abs/1909.11942)
- [BART](https://arxiv.org/abs/1910.13461)
- [BERT](https://arxiv.org/abs/1810.04805)
Expand All @@ -39,6 +39,7 @@ The list of supported model below:
- [MarkupLM](https://arxiv.org/abs/2110.08518)
- [RoBERTa](https://arxiv.org/abs/1907.11692)
- [Splinter](https://arxiv.org/abs/2101.00438)
- [Tapas](https://arxiv.org/abs/2211.06550)
- [ViLT](https://arxiv.org/abs/2102.03334)
- [ViT](https://arxiv.org/abs/2010.11929)
- [ViT-MAE](https://arxiv.org/abs/2111.06377)
Expand All @@ -60,20 +61,36 @@ In order to use the `BetterTransformer` API just run the following commands:
>>> model_hf = AutoModelForSequenceClassification.from_pretrained("bert-base-cased")
>>> model = BetterTransformer.transform(model_hf, keep_original_model=True)
```

You can leave `keep_original_model=False` in case you want to overwrite the current model with its `BetterTransformer` version.

More details on `tutorials` section to deeply understand how to use it, or check the [Google colab demo](https://colab.research.google.com/drive/1Lv2RCG_AT6bZNdlL1oDDNNiwBBuirwI-?usp=sharing)!


<div class="mt-10">
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5">
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./tutorials/convert"
><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Tutorials</div>
<p class="text-gray-700">Learn the basics and become familiar with 🤗 and `BetterTransformer` integration. Start here if you are using 🤗 Optimum for the first time!</p>
<a
class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./tutorials/convert"
>
<div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
Tutorials
</div>
<p class="text-gray-700">
Learn the basics and become familiar with 🤗 and `BetterTransformer`
integration. Start here if you are using 🤗 Optimum for the first time!
</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./tutorials/contribute"
><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">How-to guides</div>
<p class="text-gray-700">You want to add your own model for `BetterTransformer` support? Start here to check the contribution guideline!</p>
<a
class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./tutorials/contribute"
>
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
How-to guides
</div>
<p class="text-gray-700">
You want to add your own model for `BetterTransformer` support? Start
here to check the contribution guideline!
</p>
</a>
</div>
</div>
3 changes: 3 additions & 0 deletions optimum/bettertransformer/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
BertLayerBetterTransformer,
DistilBertLayerBetterTransformer,
FSMTEncoderLayerBetterTransformer,
TapasLayerBetterTransformer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be removed

ViltLayerBetterTransformer,
ViTLayerBetterTransformer,
Wav2Vec2EncoderLayerBetterTransformer,
Expand Down Expand Up @@ -70,6 +71,8 @@
# FSMTModel:
"EncoderLayer": FSMTEncoderLayerBetterTransformer,
"ViltLayer": ViltLayerBetterTransformer,
# Tapas Model
"TapasLayer": TapasLayerBetterTransformer,
}


Expand Down
108 changes: 108 additions & 0 deletions optimum/bettertransformer/models/encoder_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,114 @@
from .base import BetterTransformerBaseLayer


class TapasLayerBetterTransformer:
JuheonChu marked this conversation as resolved.
Show resolved Hide resolved
def __init__(self, tapas_layer, config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to remove the class TapasLayerBetterTransformer in full 😅 It is not needed in the end since BertLayerBetterTransformer can handle Tapas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I removed class TapasLayerBetterTransformer. Is there anything to be adjusted in this file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On your branch, with optimum dev installed (pip install -e . in the optimum top folder), can you try to run the script @younesbelkada provided you and see what happens?

from transformers import AutoModel
from optimum.bettertransformer import BetterTransformer

model_id = "hf-internal-testing/tiny-random-TapasModel"
model = AutoModel.from_pretrained(model_id)
bt_model = BetterTransformer.transform(model)

r"""
A simple conversion of the TAPAS layer to its `BetterTransformer` implementation.

Args:
tapas_layer (`torch.nn.Module`):
The original TAPAS Layer where the weights needs to be retrieved.
"""
super().__init__(config)
# In_proj layer
self.in_proj_weight = nn.Parameter(
torch.cat(
[
tapas_layer.attention.query.weight,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the Tapas implementation, https://github.com/huggingface/transformers/blob/28247e78819ab9756b81f8df39611c333d099400/src/transformers/models/tapas/modeling_tapas.py#L442 , I think we need here and in the rest tapas_layer.attention.self.query.weight, that is why the test is failing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh........ thanks you for providing me with your insights!

tapas_layer.attention.key.weight,
tapas_layer.attention.value.weight,
]
)
)
self.in_proj_bias = nn.Parameter(
torch.cat(
[
tapas_layer.attention.query.bias,
tapas_layer.attention.key.bias,
tapas_layer.attention.value.bias,
]
)
)

# Out proj layer
self.out_proj_weight = tapas_layer.attention.dense.weight
self.out_proj_bias = tapas_layer.attention.dense.bias

# Linear layer 1
self.linear1_weight = tapas_layer.ffn.weight
self.linear1_bias = tapas_layer.ffn.bias

# Linear layer 2
self.linear2_weight = tapas_layer.ffn_output.weight
self.linear2_bias = tapas_layer.ffn_output.bias

# Layer norm 1
self.norm1_eps = tapas_layer.attention.LayerNorm.eps
self.norm1_weight = tapas_layer.attention.LayerNorm.weight
self.norm1_bias = tapas_layer.attention.LayerNorm.bias

# Layer norm 2
self.norm2_eps = tapas_layer.full_layer_layer_norm.eps
self.norm2_weight = tapas_layer.full_layer_layer_norm.weight
self.norm2_bias = tapas_layer.full_layer_layer_norm.bias

# Model hyper parameters
self.num_heads = tapas_layer.attention.num_attention_heads
self.embed_dim = tapas_layer.attention.all_head_size

# Last step: set the last layer to `False` -> this will be set to `True` when converting the model
self.is_last_layer = False

self.validate_bettertransformer()

def forward(self, hidden_states, attention_mask, *_, **__):
r"""
This is just a wrapper around the forward function proposed in:
https://github.com/huggingface/transformers/pull/19553
"""
super().forward_checker()

if hidden_states.is_nested:
attention_mask = None

if attention_mask is not None:
# attention mask comes in with values 0 and -inf. we convert to torch.nn.TransformerEncoder style bool mask
# 0->false->keep this token -inf->true->mask this token
attention_mask = attention_mask.bool()
attention_mask = torch.reshape(attention_mask, (attention_mask.shape[0], attention_mask.shape[-1]))
seqlen = attention_mask.shape[1]
lengths = torch.sum(~attention_mask, 1)
if not all([l == seqlen for l in lengths]):
hidden_states = torch._nested_tensor_from_mask(hidden_states, ~attention_mask)
attention_mask = None

hidden_states = torch._transformer_encoder_layer_fwd(
hidden_states,
self.embed_dim,
self.num_heads,
self.in_proj_weight,
self.in_proj_bias,
self.out_proj_weight,
self.out_proj_bias,
self.use_gelu,
self.norm_first,
self.norm1_eps,
self.norm1_weight,
self.norm1_bias,
self.norm2_weight,
self.norm2_bias,
self.linear1_weight,
self.linear1_bias,
self.linear2_weight,
self.linear2_bias,
attention_mask,
)
if hidden_states.is_nested and self.is_last_layer:
hidden_states = hidden_states.to_padded_tensor(0.0)
return (hidden_states,)


class AlbertLayerBetterTransformer(BetterTransformerBaseLayer):
def __init__(self, albert_layer, config):
r"""
Expand Down
3 changes: 3 additions & 0 deletions tests/bettertransformer/test_bettertransformer_encoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
ALL_ENCODER_DECODER_MODELS_TO_TEST = [
"hf-internal-testing/tiny-random-FSMTModel",
"hf-internal-testing/tiny-random-BartModel",
"hf-internal-testing/tiny-random-TapasModel",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this to ALL_ENCODER_MODELS_TO_TEST instead? Tapas is an encoder-only model ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading https://huggingface.co/docs/transformers/v4.24.0/en/model_doc/tapas#transformers.TapasModel

The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of cross-attention is added between the self-attention layers, following the architecture described in Attention is all you need by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see! Thanks for an insightful explanation! I just moved it! :)

]


Expand Down Expand Up @@ -87,6 +88,8 @@ def _loop_all_classes(self):
elif layer_class == "TransformerBlock":
# Hardcode it for distilbert - see https://github.com/huggingface/transformers/pull/19966
class_name = "DistilBert"
elif layer_class == "TapasLayer":
class_name = "Tapas"
JuheonChu marked this conversation as resolved.
Show resolved Hide resolved
elif "EncoderLayer" in layer_class:
class_name = layer_class[:-12]
else:
Expand Down