Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dinov2 for depth estimation #26057

Closed
rfan-debug opened this issue Sep 8, 2023 · 4 comments · Fixed by #26092
Closed

Dinov2 for depth estimation #26057

rfan-debug opened this issue Sep 8, 2023 · 4 comments · Fixed by #26092

Comments

@rfan-debug
Copy link

rfan-debug commented Sep 8, 2023

Feature request

Dinov2's original repo has an example using Dinov2 backbone + DPT head for depth estimation notebook link. If we can integrate it into transformers repo by adding a class Dinov2ForImageDepthEstimation and let forward method return DepthEstimatorOutput, we'll have a unified output interface across all depth estimation models. By doing this, we can easily chain this powerful depth estimation method together with other models under transformers's pipelines.

Motivation

This would be a very great feature for many production use cases or research problems. One example is camera angle estimation from a 2D image, in which reliable depth information are critical. In my limited test cases, using dinov2+DPT head to run depth estimation is way better than the existing DPT model itself.

Your contribution

I can submit a PR to add this feature if other professional developers don't have the bandwidth to deal with it. (I am relatively new to transformers's develop workflow though.)

@amyeroberts
Copy link
Collaborator

Hi @rfan-debug, this would be a great contribution!

If you'd like to open a PR we'd be happy to review and answer any questions if you need help.

cc @rafaelpadilla

@NielsRogge
Copy link
Contributor

Hi,

So I saw they released the DINOv2 checkpoints with a DPT head: https://github.com/facebookresearch/dinov2#pretrained-heads---depth-estimation. I do have a PR which extends DPT to leverage the AutoBackbone API. This means that the DPT head can be used together with any backbone (like ViT, DINOv2, etc.). This way, we could just do the following:

from transformers import Dinov2Config, DPTConfig, DPTForDepthEstimation

backbone_config = Dinov2Config(num_hidden_layers=2, num_attention_heads=4, out_features=["stage1", "stage2", "stage3", "stage4")
config = DPTConfig(backbone_config=backbone_config)
model = DPTForDepthEstimation(config)

=> so would be great to leverage this instead of adding a standalone Dinov2ForDepthEstimation.

@rfan-debug
Copy link
Author

@NielsRogge Leveraging the AutoBackbone API is a great idea. Thanks for your advice and contributions! I'll follow your code examples.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants