-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of SuperPoint and AutoModelForInterestPointDescription #25786
Conversation
- Added the SuperPointConfig - Added the SuperPointModel and its implementation - Added new ImagePointDescriptionOutput dataclass
… and added it in AutoModels
@amyeroberts I have some questions about the implementation and AutoModel classes. First of all, I try to follow as much as possible the patterns I see in other model implementations (resnet or convnextv2 for example), but unlike these models, SuperPoint really only have one function or "mode" here, just to output the keypoints, their scores and descriptors. This is why I only implemented SuperPoint as a Then I added this Finally,
Apart from adding the AutoModelForInterestPointDescription, I couldn't find how to define such inputs and outputs, is it a new pipeline I should define or something else ? |
- Divided SuperPointModel in multiple submodules to handle hidden states in ModelOutput support (see ImagePointDescriptionOutput). - Added mandatory information to the SuperPointPreTrainedModel class such as the main input name and supports_gradient_checkpointing boolean. - Added weight initialization - Added imports to transformers.__init__.py
Hi @sbucaille, This is a bit of a special case. For other models which only perform a single task, what we normally do is just have |
@sbucaille From next week, I'll be off for a few weeks. If you have vision-specific questions, please ping @rafaelpadilla; for implementation questions @ArthurZucker. |
@@ -298,3 +298,41 @@ def forward( | |||
last_hidden_state=last_hidden_state, | |||
hidden_states=encoder_outputs.hidden_states, | |||
) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SuperPointModelForInterestPointDescription is more or less the exact same as SuperPointModel.
I just use it for my own understanding of the transformers library since I'm taking inspiration on other implementations (like ConvNextV2). So either SuperPointModel or SuperPointModelForInterestPointDescription might be deleted later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In transformers, the class XXXModel
does not contain the head on top and the classes XXXForYYY
add different head models. XXX
represents the model name and YYY
represents a task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the proposed implementation is correct regarding this convention ? Because I can't really tell what would be considered as a head in SuperPoint. Or shoud SuperPointModel
only contain the encoder
and the SuperPointForInterestPointDescription
contain the keypoint_decoder
and descriptor_decoder
? And then in this case SuperPointModel.forward()
should output a BaseModelOutputWithPoolingAndNoAttention
similar to ConvNextV2 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the paper and code structure, the SuperPoint is a traditional CNN based model, and I couldn't identify what could be identified as the head
.
The encoder is simply a CNN which returns features (last_hidden
). The 2 decoders keypoint_decoder
and descriptor_decoder
, return keypoints+scores and descriptors, respectively. So, I think the structure that best fits this case is:
- The
SuperPointModel
should only contain theencoder
, aSuperPointEncoder
object. Then, theSuperPointModel.forward()
should output aBaseModelOutputWithNoAttention
object. - The
SuperPointForInterestPointDescription
should contain thesuperpoint
, aSuperPointModel
object, and bothkeypoint_decoder
anddescriptor_decoder
. Then, theSuperPointForInterestPointDescriptor.forward()
should output aImagePointDescriptionOutput
object.
@amyeroberts , what do you think of this structure? Makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case - I would just have SuperPointModel which contains the encoder and decoders. In modeling_auto.py
- AutoModelForInterestPointDescription
should then map to loading this model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood correctly, I need to remove every mentions of SuperPointForInterestPointDescription and only keep SuperPointModel that can be instantiated by AutoModelForInterestPointDescription ? this commit reflects these changes, let me know if I misunderstood something
… and actually added the tests
…ch the original model outputs
- Filled SuperPoint integration tests with the pretrained model with shape and value checks on the outputs of the model
Hi @ArthurZucker, I added the SuperPointImageProcessor as part of the code because SuperPoint requires a grayscale image as input. But when I added the tests I have the test_call_pil which fails and give me a very weird when it reaches these lines
with the following error :
Not sure what causes the problem as I tried to compare with tests made with the ConvNextImageProcessor which does not raise any error. Anyway, I continue on the implementation, let me know if I'm missing anything. I'll write the documentation for all the code I've previously pushed. |
Hi @sbucaille, 🙂 A quick help with that issue: I see that your processing converts all images to grayscale (here) and tests are failing here. The root cause is that the This has been discussed in PR #25767 and is not fully solved yet. A quick solution for this issue in your code may be possible. See that a 1 single channel image is definetely grayscale, but if the 3 channels in an RGB image are equal (R==G and G==B), the image is also noted as grayscale. So, if you replicate the channels of your 1-channel grayscale image as in here, this issue can be solved. However, SuperPoint would need to be adapted for that -> you would only need to consider one of the RGB channels, as they are equal. |
# Conflicts: # src/transformers/models/superpoint/modeling_superpoint.py
Hi @rafaelpadilla and @ArthurZucker , @ArthurZucker, please let me know what I'm missing in the implementation ! 🙂
|
Hi, |
Hi @sbucaille, It seems that the original code is under MIT license. If it is the case, you just need to add the MIT license on the top of the files, as done in graphormer and IDEFICS. The checkpoints seem to be under a non-commercial customized license. So, as you have already added the License here, you just need to set |
…rPointEncoder convolution layers are abstracted into SuperPointConvBlock. SuperPointConfig now has different attributes to separate encoder and decoders parameters. convert_superpoint_to_pytorch.py has been changed accordingly.
…uperPoint code to comply with library standard
… tensor shape manipulations
Hi, |
Sorry about this, but I am afraid there is nothing we can help on this situation (at least not from me). |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
This PR implements SuperPoint, one of the few models that generate keypoints and descriptors given an image, as discussed in this previous pull request
The goal is to implement this model and a new type of AutoModel :
AutoModelForInterestPointDescription
(name to be discussed).Who can review?
@amyeroberts @ArthurZucker
TODO's
AutoModelForInterestPointDescription
mapping