-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DINOv2 depth estimation #26092
Add DINOv2 depth estimation #26092
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
Main comment is about backwards compatibility with config arguments. Otherwise looks good and just a few small nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a mile for adding support to more models!
✅ for the changes to the image processor
✅ for the conversion scripts, if the tests are optional and in separate functions.
✅ for the changes to dino, if we make them BC
🟨 for the changes to DPT, that's a lot of changes. @amyeroberts did not seem against it, but wondering if it makes sense to have a new model? Are we stuck because DINOv2 uses it and we would have a lack of consistency?
b288ab3
to
6763b36
Compare
0278c67
to
aef0d89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating!
Just a few small things to address before merging
7b09882
to
8cd7e50
Compare
@amyeroberts thanks for your review, I've addressed all comments. Feel free to merge :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating!
A few final small comments / things to address.
General comment: I completely agree with @ArthurZucker's comments about the added complexity to the model. I'm not a fan of adding this if/else structure in the forward method. However, I don't see an easy way to change because of the backbone_out_indices
argument and the weight loading already mapping to self.dpt
.
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
c488a74
to
1b3d95c
Compare
I've addressed all your comments, feel free to merge. To do:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating!
@NielsRogge Thanks again for this addition - merged! I'll let you handle the updates to the configs on the hub. |
Hello guys. Thank you for adding this. I have a silly question that is there any plan that facebook will push this to the huggingface repo? Like |
Hi @Starlento, All models are on the hub: https://huggingface.co/models?pipeline_tag=depth-estimation&other=dinov2&sort=trending. I'll open a PR to make this more explicit in the docs |
* First draft * Fix style * More improvements * Fix tests * Fix tests * Convert checkpoint * Improve DPTImageProcessor * Remove scripts, improve conversion script * Remove print statements * Fix test * Improve docstring * More improvements * Fix style * Fix image processor * Add tests * Address comments * Address comments * Make bias backwards compatible * Address comment * Address comment * Address comment * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Address comments * Add flag * Add tests * Make tests smaller * Use regular BackboneOutput * Fix all tests * Update test * Convert more checkpoints * Convert giant checkpoints, add integration test * Rename size_divisibility to size_divisor --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Hey, guys, I want to know if it is possible to release the training part, and if it is possible, when will it be released? |
hidden_states = backbone_hidden_states | ||
|
||
patch_height, patch_width = None, None | ||
if self.config.backbone_config is not None and self.config.is_hybrid is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why calculate patch_height
and patch_width
(the input image size as multiples of patch_size) under these conditions only? If backbone_config is None, self.config.patch_size
can be used instead of self.config.backbone_config.patch_size
.
@NielsRogge
What does this PR do?
Fixes #26057
PR that implements a part of #25799. It extends the DPT framework to use the
AutoBackbone
class. Next it usesDinov2Backbone
to convert the DINOv2+DPT checkpoints released by the authors here.To do:
out_indices
are saved properly => done in [AutoBackbone] Add test #26094DPTImageProcessor