-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TF: use the correct config with (...)EncoderDecoder
models
#18097
Conversation
(...)EncoderDecoder
models
The documentation is not available anymore as the PR was closed or merged. |
07c1575
to
0dc94b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing! As for your question, it depends if they are all exact copies and can easily be refactored in one mixin or not.
I believe they are -- going to give it a go afterwards if @ydshieh also agrees :) |
I have limited connection at this moment in the mountain, so feel free to merge if you prefer. Regarding the common mixin, good for me. I see there are a few little things to address, like the input names (input_ids, pixel values etc). Thank you for the fix, @gante |
@ydshieh can I have a review plz 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix, @gante . Would you mind to rebase on main
, and run test_pt_tf_model_equivalence
for TF (Vision)EncoderDecoderModel? Thank you.
unpacked_inputs = input_processing(func, self.config, main_input, **fn_args_and_kwargs) | ||
|
||
# Encoder Decoder models delegate the application of the configuration options to their inner models. | ||
if "encoder_decoder" in str(self).lower(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit, no need to feel strong:
I would personally prefer to use self.__name__
instead of str
and test against EncoderDecoder
. If I understand correctly, str
gives the full path, which includes the module name like encoder_decoder
or vision_encoder_decoder
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying self.__name__
on this line, and it seems like it isn't defined in some cases -- e.g. in tests/models/vision_encoder_decoder/test_modeling_tf_vision_encoder_decoder.py::TFViT2GPT2EncoderDecoderModelTest::test_encoder_decoder_model
, it throws *** AttributeError: 'TFVisionEncoderDecoderModel' object has no attribute '__name__'
str(self)
here includes the full import path for the object, like you wrote, which contains the class name -- '<transformers.models.vision_encoder_decoder.modeling_tf_vision_encoder_decoder.TFVisionEncoderDecoderModel object at 0x7fc7783b8a60>'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can update in the future, but since it is working for now I'm going with it :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for the info.
if use_cache: | ||
past_key_values = decoder_outputs[1] | ||
# The starting index of the remaining elements in `decoder_outputs` | ||
start_index = sum([1 if x is not None else 0 for x in (loss, logits, past_key_values)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvement / cleanup!
decoder_attention_mask = decoder_attention_mask[:, :-1] | ||
encoder_model, decoder_model = self.get_encoder_decoder_model(config, decoder_config) | ||
enc_dec_model = EncoderDecoderModel(encoder=encoder_model, decoder=decoder_model) | ||
enc_dec_model.config.output_attentions = True # model config -> won't work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice addition!
0dc94b8
to
250d5fd
Compare
@ydshieh rebased with main and reran tests -- all working 👍 |
250d5fd
to
f59bbb7
Compare
What does this PR do?
Fixes #18071
Modifies
unpack_inputs
to ignore the config file for(...)EncoderDecoder
models, mimicking the behavior in PT. If we don't ignore it, then unset options will get set with the config's default (False
for most of them), causing the inner models to ignore their own config files.EncoderDecoder
models. I then noticed that other(...)EncoderFecoder
tests have copy/pasted their ownEncoderDecoderMixin
, so I've left the other classes for a follow-up PR with the following question: should a commonEncoderDecoderMixin
be defined and shared across(...)EncoderDecoder
tests, or should I add a similar test to all other classes individually?