Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bit mask support for dropout #656

Open
eric-haibin-lin opened this issue Feb 27, 2020 · 4 comments
Open

bit mask support for dropout #656

eric-haibin-lin opened this issue Feb 27, 2020 · 4 comments
Assignees
Labels
enhancement A feature or an optimization request

Comments

@eric-haibin-lin
Copy link

For dropout training, one can save the dropout mask with 1 bit per coordinate. Can we support that in DNNL? Memory is precious.

@eric-haibin-lin eric-haibin-lin added the enhancement A feature or an optimization request label Feb 27, 2020
@vpirogov
Copy link
Member

Hi @eric-haibin-lin,

Thank you for your question. Technically nothing prevents us from introducing dropout primitive in the library, including the 1-bit mask support. The main question we need to answer to make this happen is what API and behavior should look like to make the functionality generally useful. For dropout the main source of concern is the fact that it relies on random number generator, which may behave differently in different applications, so having random number generator as part of implementation would be a major source of incompatibility and thread safety issues.

A couple of follow up questions so that I can better understand what you are looking for:

  • What you expect from the DNNL implementation (vs implementing the functionality directly in C++)?
  • What API will make sense to you? Is a function that takes pre-computed mask and performs dropout viable?

@vpirogov vpirogov self-assigned this Feb 28, 2020
@TaoLv
Copy link
Contributor

TaoLv commented Mar 2, 2020

I think the random number generator is taking a significant part in the execution of dropout. That's why we optimized it in MXNet with viRngBernoulli from VSL. But now viRngBernoulli cannot meet the requirement anymore for bit mask generation. So I would expect DNNL covers the RNG part and then the forward interface should look like:

  • input: source data, random seed, distribution type, mask type, p value
  • output: destination data, workspace for the mask

in which, the mask type can be bit mask, boolean mask, or integer mask. I'm not sure if boolean mask or integer mask has any advantage but they're used in frameworks.

@eric-haibin-lin @apeforest Could you please share more insights about the random seed distribution in MXNet and reproducibility of the operator?

@apeforest
Copy link

The random seed should be taken from MXNet so that if user specify a random seed in mxnet it should guarantee reproducibility. A similar approach has been done for cuDNN library: apache/mxnet#17547

@TaoLv
Copy link
Contributor

TaoLv commented Jun 20, 2020

I notice there is a RFC opened for this request. You may want to take a look. @eric-haibin-lin @apeforest @pengzhao-intel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A feature or an optimization request
Projects
None yet
Development

No branches or pull requests

4 participants