Using the ViT for Single-channel Images #727
Unanswered
ahmed1996said
asked this question in
Q&A
Replies: 2 comments 2 replies
-
Hi there.
|
Beta Was this translation helpful? Give feedback.
0 replies
-
@Abhishek-Prajapat just create the model with arg EDIT: also @ahmed1996said |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello all,
I'm wondering how to use a pretrained ViT model to finetune on single channel images?
I tried adjusting the architecture (for
vit_base_patch16_224
) so that the patch embedding projection layer isinstead of
Similarly, I adjusted the encoder inputs/outputs to be 256 instead of 768 and made similar changes to the the attention qkv, attention projection layer and the MLP.
However, I face an error when training complaining about the sizes:
Sizes of tensors must match except in dimension 2. Got 256 and 768 (The offending index is 0)
triggered by the forward pass function.Any suggestions on how to correctly use the model for single channel images?
Beta Was this translation helpful? Give feedback.
All reactions