-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update serving code to enable saved_model=True
#18153
Update serving code to enable saved_model=True
#18153
Conversation
The documentation is not available anymore as the PR was closed or merged. |
@amyeroberts thank you for digging into this one. An important one albeit.
Could you shed more details on the locally testing you're referring to here? |
@sayakpaul Sure :) In the test
This will fail for vision models with the channel dim error I posted above. If I changed the input signature such that channel dimension is hard coded then it runs e.g.
I run the test with |
Thanks. A follow-up question. So if you run with ( |
@@ -227,7 +227,7 @@ def _compute_mask_indices( | |||
f" `sequence_length`: {sequence_length}`" | |||
) | |||
# compute number of masked spans in batch | |||
num_masked_spans = int(mask_prob * sequence_length / mask_length + tf.random.uniform((1,))) | |||
num_masked_spans = int(mask_prob * sequence_length / mask_length + np.random.uniform(size=(1,))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using tf.random.uniform
here creates issues with int casting here, max
on L231 and other logic that assumes an int downstream when the graph is created when saving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts won't using np
create problems in graph mode? For e.g. if the model was compiled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I've removed it and added all the necessary tf logic.
@@ -383,7 +383,8 @@ def serving(self, inputs): | |||
inputs (`Dict[str, tf.Tensor]`): | |||
The input of the saved model as a dictionary of tensors. | |||
""" | |||
return self.call(inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding in self.serving(output)
return to:
- keep in line with the rest of the models call pattern and return types e.g. not including loss (I don't know why this is done)
- allow for testing (see PR desc.)
@@ -1127,12 +1127,12 @@ def call( | |||
training=training, | |||
) | |||
|
|||
# Copied from transformers.models.distilbert.modeling_tf_distilbert.TFDistilBertModel.serving_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing copied from across models as the tensors in the tuples for hidden_states and activations are different sizes so can't be called with tf.convert_to_tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick look in Funnel docstring, I could not find why they have different sizes.
transformers/src/transformers/models/funnel/modeling_funnel.py
Lines 843 to 845 in 2c5747e
hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): | |
Tuple of `torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) of | |
shape `(batch_size, sequence_length, hidden_size)`. |
Maybe the docstring is wrong, and we should update it (in another PR for sure) ..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems because it uses pooling, the sequence length is different after each block:
- Since Funnel Transformer uses pooling, the sequence length of the hidden states changes after each block of layers. |
hidden_states = tf.identity(input_features) # TF Conv1D assumes Batch x Time x Channels, same as the input | ||
hidden_states = tf.cast( | ||
input_features, tf.float32 | ||
) # TF Conv1D assumes Batch x Time x Channels, same as the input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the tf.identity
was doing here - will removing this break anything @gante?
Casting to float to enable the tf.concat
in graph mode in L156
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, this line only existed to rename the variable (and probably hidden_states = input_features
would be enough for that 🤔 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be a mutation thing? tf.identity returns a copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be frank I can't remember if that was the cause :D But since it was my first HF PR, I would err towards a bad choice on my end
@@ -227,7 +227,7 @@ def window_reverse(windows: tf.Tensor, window_size: int, height: int, width: int | |||
Merges windows to produce higher resolution features. | |||
""" | |||
x = shape_list(windows)[0] | |||
y = tf.cast(height * width / window_size / window_size, tf.int32) | |||
y = tf.cast(height * width / (window_size * window_size), tf.int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous calculation fails because:
height * width / widow_size
is calculated first, before the finalwindow_size
division.height
,width
are ints,window_size
is a tensor of type int- Therefore, the resulting
height * width / widow_size
is a tensor of type float - This fails when dividing by
window_size
again as the tensors are of different types
@@ -1396,7 +1414,7 @@ def __init__(self, config: SwinConfig): | |||
self.classifier = ( | |||
tf.keras.layers.Dense(config.num_labels, name="classifier") | |||
if config.num_labels > 0 | |||
else tf.identity(name="classifier") | |||
else tf.keras.layers.Activation("linear", name="classifier") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug
def build(self, input_shape: tf.TensorShape): | ||
""" | ||
Build shared token embedding layer Shared weights logic adapted from | ||
https://github.com/tensorflow/models/blob/a009f4fb9d2fc4949e32192a944688925ef78659/official/transformer/v2/embedding_layer.py#L24 | ||
""" | ||
self._resize_embeddings() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_resize_embeddings
causes issues when in graph mode because it re-adds the parameter "weights", meaning the model doesn't know what can happen to the node. Instead here, we add once and then reassign values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, embeddings in general have to be rewritten -- they still use TF 1 code :(
self.embeddings = self.add_weight( | ||
name="weights", # name also used in PT | ||
shape=shape_list(self.embedding_weights), | ||
trainable=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell from PT model, this should be fixed. Is this right @gante ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it is a non-trainable positional embedding that is obtained from sines and cosines 👍
@@ -942,22 +942,6 @@ def get_embed_tokens(self): | |||
def set_embed_tokens(self, embed_tokens): | |||
self.embed_tokens = embed_tokens | |||
|
|||
def _prepare_decoder_attention_mask(self, attention_mask, input_shape, inputs_embeds, past_key_values_length): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function causes issues in graph mode, as combined_attention_mask
can be either a tensor or None. Instead, the equivalent logic in bart call method is used in this model's to safely create combined_attention_mask
@@ -659,64 +659,46 @@ def __init__( | |||
self.drop_path = ( | |||
TFSwinDropPath(config.drop_path_rate, name="drop_path") | |||
if config.drop_path_rate > 0.0 | |||
else tf.identity(name="drop_path") | |||
else tf.keras.layers.Activation("linear", name="drop_path") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug
@sayakpaul Yes, I hardcode then run the tests. In this case, they pass. |
|
||
# make sure num masked indices <= sequence_length | ||
if num_masked_spans * mask_length > sequence_length: | ||
num_masked_spans = sequence_length // mask_length | ||
num_masked_spans = tf.cond( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for hubert. In graph mode, we can't used a tensor in an if/else statement. You get the following error:
OperatorNotAllowedInGraphError: using a tf.Tensoras a Pythonbool is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same comment as for hubert)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, looks good to me 👍 Thanks for fixing all these failing cases!
P.S.: the changes to swin were pretty big and probably should have been a PR on their own, but I'm also often guilty of this sin :D Have you double-checked the slow tests for the models with the most changes?
|
||
# make sure num masked indices <= sequence_length | ||
if num_masked_spans * mask_length > sequence_length: | ||
num_masked_spans = sequence_length // mask_length | ||
num_masked_spans = tf.cond( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍 Perhaps for readability, if it is allowed, it can be rewritten as tf.math.minimum(sequence_length // mask_length, num_masked_spans)
?
src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py
Outdated
Show resolved
Hide resolved
def build(self, input_shape: tf.TensorShape): | ||
""" | ||
Build shared token embedding layer Shared weights logic adapted from | ||
https://github.com/tensorflow/models/blob/a009f4fb9d2fc4949e32192a944688925ef78659/official/transformer/v2/embedding_layer.py#L24 | ||
""" | ||
self._resize_embeddings() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, embeddings in general have to be rewritten -- they still use TF 1 code :(
self.embeddings = self.add_weight( | ||
name="weights", # name also used in PT | ||
shape=shape_list(self.embedding_weights), | ||
trainable=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it is a non-trainable positional embedding that is obtained from sines and cosines 👍
src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py
Outdated
Show resolved
Hide resolved
|
||
# make sure num masked indices <= sequence_length | ||
if num_masked_spans * mask_length > sequence_length: | ||
num_masked_spans = sequence_length // mask_length | ||
num_masked_spans = tf.cond( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same comment as for hubert)
Before merge, could you measure the timing for the tests python -m pytest -v tests -k "test_saved_model_creation" --durations=0 --make-reports=tests_timing and copy-paste the results from |
@ydshieh We don't run the slow tests on CPU, only on GPU/multi-GPU. |
I should double check the latest version. My memory was in a previoius commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @amyeroberts for this great work! I left a few questions.
@tf.function( | ||
input_signature=[ | ||
{ | ||
"input_values": tf.TensorSpec((None, None), tf.int32, name="input_ids"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some doubt here: input_values
should be float, I think? Same for Wav2Vec2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the name
should be changed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the name 👍 Following your comment below, input_values are set to float and attention_maks and token_types_ids as int.
@tf.function( | ||
input_signature=[ | ||
{ | ||
"input_values": tf.TensorSpec((None, None), tf.int32, name="input_ids"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_values
should be float? Also to change name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the name 👍 Following your comment below, input_values are set to float and attention_maks and token_types_ids as int.
@@ -608,7 +608,7 @@ def _get_feat_extract_output_lengths(self, input_lengths: tf.Tensor): | |||
@tf.function( | |||
input_signature=[ | |||
{ | |||
"input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), | |||
"input_features": tf.TensorSpec((None, None, None), tf.int32, name="input_ids"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_features
should be float? Also to change name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the name.
For the types, from the pytorch model, input_features
, decoder_inputs
and decoder_attention_mask
should be int32. attention_mask
isn't specified, but test_saved_model_creation
fails if it's set to float32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the 2 places
transformers/tests/test_modeling_common.py
Lines 1720 to 1721 in 2c5747e
elif key == "input_features": | |
tf_inputs_dict[key] = tf.convert_to_tensor(tensor.cpu().numpy(), dtype=tf.float32) |
and
transformers/tests/test_modeling_tf_common.py
Lines 531 to 532 in 2c5747e
elif name == "input_features": | |
pt_inputs_dict[name] = torch.from_numpy(key.numpy()).to(torch.float32) |
as well as the docstring (the 2nd line below)
transformers/src/transformers/models/speech_to_text/modeling_speech_to_text.py
Lines 602 to 603 in 2c5747e
input_features (`torch.LongTensor` of shape `(batch_size, sequence_length, feature_size)`): | |
Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained |
I still think
input_features
should be float. The part torch.LongTensor
seems to be a mistake.
(cc @patrickvonplaten @anton for a confirmation.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's a good point. I've updated it to float32 (passes tests locally)
@@ -1127,12 +1127,12 @@ def call( | |||
training=training, | |||
) | |||
|
|||
# Copied from transformers.models.distilbert.modeling_tf_distilbert.TFDistilBertModel.serving_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick look in Funnel docstring, I could not find why they have different sizes.
transformers/src/transformers/models/funnel/modeling_funnel.py
Lines 843 to 845 in 2c5747e
hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): | |
Tuple of `torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) of | |
shape `(batch_size, sequence_length, hidden_size)`. |
Maybe the docstring is wrong, and we should update it (in another PR for sure) ..?
Add suggestions made by Joao Gante: * Use tf.shape instead of shape_list * Use @tooslow decorator on tests * Simplify some of the logic
Address Yih-Dar Sheih comments - making tensor names consistent and make types float
@amyeroberts The |
@ydshieh I know from @sgugger's comment, we don't run on CPU, but I ran the tests for reference (Macbook Pro 2021 M1 Max)
|
I noticed a cool thing -- assuming the models with
@amyeroberts can you add these issues to your |
@ydshieh @Rocketknight1 Am I OK to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is an extremely strong PR. The model changes were comprehensive and I'm impressed that you got the SavedModel
test back into common!
* Add serving_output and serving methods to some vision models * Add serving outputs for DeiT * Don't convert hidden states - differing shapes * Make saveable * Fix up * Make swin saveable * Add in tests * Fix funnel tests (can't convert to tensor) * Fix numpy call * Tidy up a bit * Add in hidden states - resnet * Remove numpy * Fix failing tests - tensor shape and skipping tests * Remove duplicated function * PR comments - formatting and var names * PR comments Add suggestions made by Joao Gante: * Use tf.shape instead of shape_list * Use @tooslow decorator on tests * Simplify some of the logic * PR comments Address Yih-Dar Sheih comments - making tensor names consistent and make types float * Types consistent with docs; disable test on swin (slow) * CI trigger * Change input_features to float32 * Add serving_output for segformer * Fixup Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>
What does this PR do?
Fixes and adds any missing
serving
andserving_output
code to our TF models to enablemodel.save_pretrained(path, saved_model=True)
I've added comments throughout the code to explain any areas where the TF refactor might no be obvious.
I'm aware the diff of this PR is quite big, but most of it repetitive changes to enable passing of tests, so I hope acceptable.
Specifically:
1. Adds missing serving logic to: ResNet, Swin, TAPAS
2. Adds missing
serving_output
logic to modelsSome vision models didn't have
serving_output
implemented -serving
returned the model outputs directly. This was to enable testing (see 4.) and to keep consistent with the rest of the library,3. Update or add
input_signature
decorator for models4. Adds a test to check
serving_output
is implemented and return types are as expectedWe can't test
model.serving
directly i.e. this is not possible:Running this on vision models raises the following:
This is because the input signature defined in the
tf.function
decorator for theserving
method has all of the input dimensions defined asNone
:We can't hard code the channel dimensions as e.g.
3
as we want to support both RGB and greyscale images (although testing this locally does work).5. Moves
test_saved_model_creation
back intotest_modeling_tf_common
and add explicit skipsThere were quite a few models that couldn't be saved with
model.save_pretrained(path, saved_model=True)
and quite a few whose input signature or return types fromserving_output
were broken or inconsistent with the model.I think this is in part because the relevant test was moved to only be applied to certain core models, and those models didn't explicitly skip. See #14415, #9478
I've:
unittest.skip
decorator so it's counted as a skipped rather than passed test on all models that were previously skipping.6. Update logic in models such that their graph can be created and saved.
Adds serving logic to enable saving of models and ensures their outputs are transformed in line with the rest of the library
Fixes
#18179
#18164
#17233
#17285
https://discuss.huggingface.co/t/tfresnetforimageclassification-fails-with-save-pretrained-when-saved-model-is-true/20404
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.