Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DropBlock regularization layer. #137

Closed
sebastian-sz opened this issue Feb 16, 2022 · 16 comments
Closed

Add DropBlock regularization layer. #137

sebastian-sz opened this issue Feb 16, 2022 · 16 comments

Comments

@sebastian-sz
Copy link
Contributor

DropBlock is a regularization technique that is more suitable for CNN's than regular dropout. Perhaps it would be beneficial to have it available in keras-cv?

Paper

Example TF Implementation

@LukeWood
Copy link
Contributor

Hey @sebastian-sz - what would this API look like for users? some of the implementation details for this preprocessing technique are a bit nuanced to me.

Does it rely on passing bounding boxes or segmentation maps to perform augmentation? Does it rely on activations of specific layers of your CNN?

Please provide these details and comment back. Thanks!

@sebastian-sz
Copy link
Contributor Author

@LukeWood

what would this API look like for users?

I thought it could be used similary to tf.keras.layers.Dropout. Example (rewrite from here):

x = tf.keras.layers.Conv2D(...)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.ReLU()(x)
x = DropBlock(keep_probability=0.9, block_size=7)(x)

The user's would have to worry only about passing block_size and keep_probability parameters.

Does it rely on passing bounding boxes or segmentation maps to perform augmentation?

No, as far as I know it can be treated as a "bonus" block, as it only performs regularization. For example, the paper mentions adding it to backbone of RetinaNet, to boost final mAP.

Does it rely on activations of specific layers of your CNN?

The paper suggest's to use DropBlock for groups 3 and 4 in ResNet. The implementation uses DropBlock after ReLU in Conv->BatchNorm->ReLU block.

Limitations:

Sadly, this can get complicated:

  1. Paper mentiones "Scheduled DropBlock", where instead of having keep_probability constant, the value should start from 1 and slowly decrease (to target value) with each train step. Reference implementaion is here.
    I don't think this is easy in proposed implementation, as it would somehow need access to total number of steps and current steps.

  2. The implementation also modifies keep_probability values depending to which group block this is applied. As I understand, this would also change with regard to 1).

This complicates proposed solution a lot, but I'm not sure how crucial these are for this layer to provide better results.

Let me know what do you think.

@bhack
Copy link
Contributor

bhack commented Feb 17, 2022

It could be nice If we could support Google Autodropout that generally performed better then the "fixed" Dropblock:

https://arxiv.org/abs/2101.01761

google-research/google-research#727

Also I don't think, as you can see in the above paper, that this is strictly preprocessing/CV specific:

The learned dropout patterns
also transfers to different tasks and datasets, such as from lan-
guage model on Penn Treebank to Engligh-French translation
on WMT 2014

What we will do if we want to reuse this in keras-nlp?

@bhack
Copy link
Contributor

bhack commented Feb 17, 2022

See also my previous comment at #30 (comment)

@LukeWood
Copy link
Contributor

Seems like this is a useful layer to me. As for what to do with reuse for KerasNLP so far: Transparently, we don't know yet.

@sebastian-sz
Copy link
Contributor Author

@LukeWood I think the implementation can wait for preprocessing layer API refactoring?

The layer mentioned by @bhack is also interesting. Should Autodropout be implemented instead of Dropblock, or should both layers coexist?

@LukeWood
Copy link
Contributor

Yes, let’s wait for the preprocessing layer refactor on this one. It should be available soon

@LukeWood
Copy link
Contributor

Should Autodropout be implemented instead of Dropblock, or should both layers coexist?

I will need to read about the differences in detail before answering that one. I’m not familiar enough with the techniques yet

@bhack
Copy link
Contributor

bhack commented Feb 19, 2022

As I've mentioned the Github link of the reference implementaton in the Google paper is broken.

Both are landing in Pytorch vision:

pytorch/vision#5416

@sebastian-sz
Copy link
Contributor Author

Interesting, I didn't know about torchvision PR. Their proposed API looks similar to what I described above. I think it would be beneficial to also have Dropblock implemented here.

@LukeWood
Copy link
Contributor

Thanks for providing so many references. This looks like a great contribution. I've added the contributions welcome label.

@LukeWood
Copy link
Contributor

LukeWood commented Mar 9, 2022

@sebastian-sz have you looked at the increasing DropBlock schedule that the paper recommends?

@bhack
Copy link
Contributor

bhack commented Mar 9, 2022

Just a reminder in the case we want to extend to the 3d case:
https://github.com/pytorch/vision/pull/5416/files

@sebastian-sz
Copy link
Contributor Author

@LukeWood
Yes, the example is here. It seems one would need access to total_steps and current_step. I'm not sure if there is an easy way to access those without having the user explicitly pass total_steps.

@bhack
Yes, 3d case is similar. It could be added in this PR or with a separate issue + PR. I can add it here, what are your opinions?

@bhack
Copy link
Contributor

bhack commented Mar 10, 2022

@bhack
Yes, 3d case is similar. It could be added in this PR or with a separate issue + PR. I can add it here, what are your opinions?

As you like.

@sebastian-sz
Copy link
Contributor Author

@bhack I'd prefer 3D variant to be added in a separate PR (perhaps separate issue). There is still some work with 2D variant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants