Implementation of SuperPoint and AutoModelForInterestPointDescription #25786

sbucaille · 2023-08-27T20:05:43Z

What does this PR do?

This PR implements SuperPoint, one of the few models that generate keypoints and descriptors given an image, as discussed in this previous pull request
The goal is to implement this model and a new type of AutoModel : AutoModelForInterestPointDescription (name to be discussed).

Who can review?

@amyeroberts @ArthurZucker

TODO's

Implement SuperPointConfig and SuperPointModel as PretrainedConfig and PretrainedModel
Generate a conversion script for the original weights
Implement the new AutoModelForInterestPointDescription mapping
Test the model
Write documentation

- Added the SuperPointConfig - Added the SuperPointModel and its implementation - Added new ImagePointDescriptionOutput dataclass

… and added it in AutoModels

sbucaille · 2023-08-27T21:16:21Z

@amyeroberts I have some questions about the implementation and AutoModel classes.

First of all, I try to follow as much as possible the patterns I see in other model implementations (resnet or convnextv2 for example), but unlike these models, SuperPoint really only have one function or "mode" here, just to output the keypoints, their scores and descriptors. This is why I only implemented SuperPoint as a SuperPointModelForInterestPointDescription, so there is no SuperPointModel anymore, does that seem ok ?

Then I added this SuperPointModelForInterestPointDescription class in a new mapping dictionary in the modeling_auto file and added the appropriate AutoModel class for this. But is this kind of changes usually the output of an automated script for model registration or adding it by hand is appropriate ?

Finally,

In that PR, we can also add a mapping AutoModelForInterestPointDescription, which we define as taking two images and returning interest keypoints and their descriptions.

Apart from adding the AutoModelForInterestPointDescription, I couldn't find how to define such inputs and outputs, is it a new pipeline I should define or something else ?

- Divided SuperPointModel in multiple submodules to handle hidden states in ModelOutput support (see ImagePointDescriptionOutput). - Added mandatory information to the SuperPointPreTrainedModel class such as the main input name and supports_gradient_checkpointing boolean. - Added weight initialization - Added imports to transformers.__init__.py

amyeroberts · 2023-09-11T20:56:33Z

Hi @sbucaille,

This is a bit of a special case. For other models which only perform a single task, what we normally do is just have XxxModel. I'd suggest doing this. We can still add AutoModelForInterestPointDescription and have SuperPointModel loaded by it

amyeroberts · 2023-09-12T17:03:05Z

@sbucaille From next week, I'll be off for a few weeks. If you have vision-specific questions, please ping @rafaelpadilla; for implementation questions @ArthurZucker.

sbucaille · 2023-09-13T20:05:25Z

src/transformers/models/superpoint/modeling_superpoint.py

@@ -298,3 +298,41 @@ def forward(
            last_hidden_state=last_hidden_state,
            hidden_states=encoder_outputs.hidden_states,
        )
+
+


SuperPointModelForInterestPointDescription is more or less the exact same as SuperPointModel.
I just use it for my own understanding of the transformers library since I'm taking inspiration on other implementations (like ConvNextV2). So either SuperPointModel or SuperPointModelForInterestPointDescription might be deleted later

In transformers, the class XXXModel does not contain the head on top and the classes XXXForYYY add different head models. XXX represents the model name and YYY represents a task.

So the proposed implementation is correct regarding this convention ? Because I can't really tell what would be considered as a head in SuperPoint. Or shoud SuperPointModel only contain the encoder and the SuperPointForInterestPointDescription contain the keypoint_decoder and descriptor_decoder ? And then in this case SuperPointModel.forward() should output a BaseModelOutputWithPoolingAndNoAttention similar to ConvNextV2 ?

Looking at the paper and code structure, the SuperPoint is a traditional CNN based model, and I couldn't identify what could be identified as the head.

The encoder is simply a CNN which returns features (last_hidden). The 2 decoders keypoint_decoder and descriptor_decoder, return keypoints+scores and descriptors, respectively. So, I think the structure that best fits this case is:

The SuperPointModel should only contain the encoder, a SuperPointEncoder object. Then, the SuperPointModel.forward() should output a BaseModelOutputWithNoAttention object.

The SuperPointForInterestPointDescriptionshould contain the superpoint, a SuperPointModel object, and both keypoint_decoder and descriptor_decoder. Then, the SuperPointForInterestPointDescriptor.forward() should output a ImagePointDescriptionOutput object.

@amyeroberts , what do you think of this structure? Makes sense?

In this case - I would just have SuperPointModel which contains the encoder and decoders. In modeling_auto.py - AutoModelForInterestPointDescription should then map to loading this model.

If I understood correctly, I need to remove every mentions of SuperPointForInterestPointDescription and only keep SuperPointModel that can be instantiated by AutoModelForInterestPointDescription ? this commit reflects these changes, let me know if I misunderstood something

… and actually added the tests

…ch the original model outputs

- Filled SuperPoint integration tests with the pretrained model with shape and value checks on the outputs of the model

sbucaille · 2023-09-20T21:44:13Z

Hi @ArthurZucker, I added the SuperPointImageProcessor as part of the code because SuperPoint requires a grayscale image as input. But when I added the tests I have the test_call_pil which fails and give me a very weird when it reaches these lines

tests/models/superpoint/test_image_processing_superpoint.py:43: in prepare_image_inputs
    return prepare_image_inputs(
tests/test_image_processing_common.py:64: in prepare_image_inputs
    image_inputs = [Image.fromarray(np.moveaxis(image, 0, -1)) for image in image_inputs]
tests/test_image_processing_common.py:64: in <listcomp>
    image_inputs = [Image.fromarray(np.moveaxis(image, 0, -1)) for image in image_inputs]

with the following error :

            except KeyError as e:
                msg = "Cannot handle this data type: %s, %s" % typekey
>               raise TypeError(msg) from e
E               TypeError: Cannot handle this data type: (1, 1, 1), |u1

Not sure what causes the problem as I tried to compare with tests made with the ConvNextImageProcessor which does not raise any error.

Anyway, I continue on the implementation, let me know if I'm missing anything. I'll write the documentation for all the code I've previously pushed.

rafaelpadilla · 2023-09-21T19:31:39Z

Hi @ArthurZucker, I added the SuperPointImageProcessor as part of the code because SuperPoint requires a grayscale image as input. But when I added the tests I have the test_call_pil which fails and give me a very weird when it reaches these lines
tests/models/superpoint/test_image_processing_superpoint.py:43: in prepare_image_inputs
    return prepare_image_inputs(
tests/test_image_processing_common.py:64: in prepare_image_inputs
    image_inputs = [Image.fromarray(np.moveaxis(image, 0, -1)) for image in image_inputs]
tests/test_image_processing_common.py:64: in <listcomp>
    image_inputs = [Image.fromarray(np.moveaxis(image, 0, -1)) for image in image_inputs]
with the following error :
            except KeyError as e:
                msg = "Cannot handle this data type: %s, %s" % typekey
>               raise TypeError(msg) from e
E               TypeError: Cannot handle this data type: (1, 1, 1), |u1
Not sure what causes the problem as I tried to compare with tests made with the ConvNextImageProcessor which does not raise any error.

Anyway, I continue on the implementation, let me know if I'm missing anything. I'll write the documentation for all the code I've previously pushed.

Hi @sbucaille, 🙂

A quick help with that issue:

I see that your processing converts all images to grayscale (here) and tests are failing here.

The root cause is that the convert_to_grayscale function returns a 1-channel image (luminance) as here. So, when it is later converted to a numpy array, it will turn to be a 1-channel image, making the test fail.

This has been discussed in PR #25767 and is not fully solved yet.

A quick solution for this issue in your code may be possible. See that a 1 single channel image is definetely grayscale, but if the 3 channels in an RGB image are equal (R==G and G==B), the image is also noted as grayscale. So, if you replicate the channels of your 1-channel grayscale image as in here, this issue can be solved. However, SuperPoint would need to be adapted for that -> you would only need to consider one of the RGB channels, as they are equal.

…upport

# Conflicts: # src/transformers/models/superpoint/modeling_superpoint.py

sbucaille · 2023-09-23T14:15:02Z

Hi @rafaelpadilla and @ArthurZucker ,
Thanks @rafaelpadilla for the heads up, I adapted SuperPointModel and SuperPointImageProcessor to cover this issue. SuperPointImageProcessor now generates a 3-channel grayscaled image from a given image as input and SuperPointModel extracts one of the channels to perform the forward method. Although it may be necessary to change that in the future when 1-channel images will be supported (if it is planned to).
I added docs, as well as remaining integration tests for the AutoModelForInterestPointDescription.
I think the implementation is complete.

@ArthurZucker, please let me know what I'm missing in the implementation ! 🙂
Although I have some questions :

What should I do with the SuperPointModel and SuperPointModelForInterestPointDescription as both are basically the same, should I only keep the latter one ?
Regarding docs, there is a mention of expected_output is @add_code_sample_docstrings. I decided not to provide such information since output is dynamic, depending on the number of keypoints found. Should I keep it like that or there is a way to provide a "dynamic shape" to this function ?
Regarding tests, I have the test_model_is_small failing, what should I do about it ? And is test_retain_grad_hidden_states_attentions related to models that can be trained ? If so we should probably skip it since SuperPoint can't be trained, also it does not have attentions.

sbucaille · 2023-09-25T11:41:08Z

Hi,
When adding the docs to the code on this Saturday, I started thinking, maybe late, about the licence of SuperPoint and got an answer from an original contributor, Paul-Edouard Sarlin.
It turns out it can't be used for commercial use. I am not very familiar with legal stuff like these, but does it compromise this PR ? Or, from the HuggingFace perspective, adding the licence as in the original repo is sufficient ? I added it to the model card anyway.

rafaelpadilla · 2023-09-25T14:36:57Z

Hi, When adding the docs to the code on this Saturday, I started thinking, maybe late, about the licence of SuperPoint and got an answer from an original contributor, Paul-Edouard Sarlin. It turns out it can't be used for commercial use. I am not very familiar with legal stuff like these, but does it compromise this PR ? Or, from the HuggingFace perspective, adding the licence as in the original repo is sufficient ? I added it to the model card anyway.

Hi @sbucaille,

It seems that the original code is under MIT license. If it is the case, you just need to add the MIT license on the top of the files, as done in graphormer and IDEFICS.

The checkpoints seem to be under a non-commercial customized license. So, as you have already added the License here, you just need to set inference: false in the card as done in owlvit-large-patch14 and musicgen-large.

…rPointEncoder convolution layers are abstracted into SuperPointConvBlock. SuperPointConfig now has different attributes to separate encoder and decoders parameters. convert_superpoint_to_pytorch.py has been changed accordingly.

…uperPoint code to comply with library standard

… tensor shape manipulations

sbucaille · 2024-02-04T17:12:47Z

Hi,
Alright, I can't wrap my head around this git problem, I feel like I ended up ruining my branch with the manipulations I tried to do.
What do you think should I do ? I'm about to erase completely my branch and create a fresh one from the main branch, from where I could put back all the work I've done for these past months...

ydshieh · 2024-02-05T08:19:05Z

@sbucaille

Sorry about this, but I am afraid there is nothing we can help on this situation (at least not from me).
One tip (for future PRs): don't use git merge, use git rebase to keep the commits linear. It will avoid such undesired situation.

github-actions · 2024-03-01T08:07:42Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sbucaille added 2 commits August 27, 2023 21:55

Multiple additions to SuperPoint implementation :

fd7260c

- Added the SuperPointConfig - Added the SuperPointModel and its implementation - Added new ImagePointDescriptionOutput dataclass

Changed SuperPointModel to SuperPointModelForInterestPointDescription…

c37db40

… and added it in AutoModels

sbucaille added 2 commits September 11, 2023 22:48

Added most tests for SuperPoint implementation

d1aa616

Added missing imports and SuperPointForInterestPointDescription

5fed639

sbucaille commented Sep 13, 2023

View reviewed changes

sbucaille added 6 commits September 13, 2023 22:11

Added tests for SuperPointModelForInterestPointDescription

75cb08a

Added checkpoint conversion script for SuperPoint

bb4310f

Added SuperPointImageProcessor and associated tests

9cfcd04

Completed SuperPointImageProcessor to always turn images to grayscale…

df6d3e0

… and actually added the tests

Finished conversion script with checks whether the models outputs mat…

dc77f85

…ch the original model outputs

- Added SuperPoint pretrained model repository and necessary imports

28e7d59

- Filled SuperPoint integration tests with the pretrained model with shape and value checks on the outputs of the model

sbucaille added 7 commits September 22, 2023 21:47

Addresses the issue about lack of 1-channel grayscaled image support

a101a4c

fixup! Addresses the issue about lack of 1-channel grayscaled image s…

11346c7

…upport

Addresses the issue about lack of 1-channel grayscaled image support

cdf02b9

Merge remote-tracking branch 'origin/add_superpoint' into add_superpoint

8c3c70d

# Conflicts: # src/transformers/models/superpoint/modeling_superpoint.py

Merge branch 'main' into add_superpoint

2774ec0

Added tests to cover AutoModelForInterestPointDescription instantiation

0a49ce9

Added docs

8720221

sbucaille added 2 commits September 23, 2023 16:41

Added missing docs to ImagePointDescriptionOutput

f815311

Fixed bug where tuple return was not including last_hidden_state

8706045

Added model doc

63a742b

sbucaille and others added 24 commits February 4, 2024 17:53

Replaced missing hidden_sizes instead of conv_layer_sizes attribute

c97b81f

Small change in comments

a7bdff0

Changed one line code to 3 line code for more clarity

b9a5493

Changed typo from args to kwargs

167c1a8

Fixed indentation

15d1e32

Simplified forward method

b48f644

Switched docstring to match method signature

bd29963

Added copyright header

ddae6f5

Removed unnecessary argument

3673eff

Made static methods module level methods

621351b

Made several methods private and added docstrings

6e468e4

Removed classes attribute that are not used outside init method

6a54e7a

Removed unnecessary "torch."

d83c59f

Added is_torch_greater_than_1_13 to pytorch_utils.py and used it in S…

1057351

…uperPoint code to comply with library standard

Replaced other "torch.nn" to "nn"

55db045

Splitted few lines into more lines for clarity, and added comments on…

ab08b6e

… tensor shape manipulations

Removed unnecessary instructions

134c58b

Removed unnecessary list creation

94547ea

Refactoring

119b6e1

Removed unnecessary test

0938cca

Removed SuperPointForInterestPointDescription class

2562495

Merge remote-tracking branch 'origin/add_superpoint' into add_superpoint

07c5493

sbucaille mentioned this pull request Feb 11, 2024

Implementation of SuperPoint and AutoModelForKeypointDetection #28966

Merged

5 tasks

github-actions bot closed this Mar 10, 2024

sbucaille deleted the add_superpoint branch March 19, 2024 21:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of SuperPoint and AutoModelForInterestPointDescription #25786

Implementation of SuperPoint and AutoModelForInterestPointDescription #25786

sbucaille commented Aug 27, 2023 •

edited

Loading

sbucaille commented Aug 27, 2023

amyeroberts commented Sep 11, 2023

amyeroberts commented Sep 12, 2023

sbucaille Sep 13, 2023

rafaelpadilla Oct 13, 2023

sbucaille Oct 14, 2023 •

edited

Loading

rafaelpadilla Oct 20, 2023

amyeroberts Oct 25, 2023

sbucaille Jan 29, 2024

sbucaille commented Sep 20, 2023

rafaelpadilla commented Sep 21, 2023

sbucaille commented Sep 23, 2023 •

edited

Loading

sbucaille commented Sep 25, 2023 •

edited

Loading

rafaelpadilla commented Sep 25, 2023

sbucaille commented Feb 4, 2024

ydshieh commented Feb 5, 2024

github-actions bot commented Mar 1, 2024

Implementation of SuperPoint and AutoModelForInterestPointDescription #25786

Implementation of SuperPoint and AutoModelForInterestPointDescription #25786

Conversation

sbucaille commented Aug 27, 2023 • edited Loading

What does this PR do?

Who can review?

TODO's

sbucaille commented Aug 27, 2023

amyeroberts commented Sep 11, 2023

amyeroberts commented Sep 12, 2023

sbucaille Sep 13, 2023

Choose a reason for hiding this comment

rafaelpadilla Oct 13, 2023

Choose a reason for hiding this comment

sbucaille Oct 14, 2023 • edited Loading

Choose a reason for hiding this comment

rafaelpadilla Oct 20, 2023

Choose a reason for hiding this comment

amyeroberts Oct 25, 2023

Choose a reason for hiding this comment

sbucaille Jan 29, 2024

Choose a reason for hiding this comment

sbucaille commented Sep 20, 2023

rafaelpadilla commented Sep 21, 2023

sbucaille commented Sep 23, 2023 • edited Loading

sbucaille commented Sep 25, 2023 • edited Loading

rafaelpadilla commented Sep 25, 2023

sbucaille commented Feb 4, 2024

ydshieh commented Feb 5, 2024

github-actions bot commented Mar 1, 2024

sbucaille commented Aug 27, 2023 •

edited

Loading

sbucaille Oct 14, 2023 •

edited

Loading

sbucaille commented Sep 23, 2023 •

edited

Loading

sbucaille commented Sep 25, 2023 •

edited

Loading