Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix transposed convolution in CPU w/o MKLDNN. #14031

Closed
wants to merge 38 commits into from

Conversation

apeforest
Copy link
Contributor

@apeforest apeforest commented Jan 31, 2019

Description

transposed convolution operator in CPU w/o MKLDNN is not working properly when dilation is set. This is because the mshadow library function unpack_patch2col and pack_col2patch generate incorrect results with dilation parameter. This PR replaced these two functions with MXNet native function im2col and col2im

This PR fixs issue #11203

Passed the local test in the issue:

import mxnet as mx

data = mx.nd.array(((0,0,0),
                    (0,1,0),
                    (0,0,0)))
kernel = mx.nd.array(((1,2,3),
                      (4,5,6),
                      (7,8,9)))

data_batch = data.expand_dims(0).expand_dims(0)
weight = kernel.expand_dims(0).expand_dims(0)
# initialize and set weight
conv = mx.gluon.nn.Conv2DTranspose(in_channels=1, channels=1,
                                   kernel_size=(3,3), padding=(2,2),
                                   strides=(2,2), dilation=(2,2))
conv.initialize()
conv.weight.set_data(weight)

weight.attach_grad()
with mx.autograd.record():
    l = conv(data_batch)
    print('conv forward')
    print(l.asnumpy())
l.backward()
print('conv backward')
print(weight.grad)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@apeforest apeforest changed the title Fix transposed convolution in CPU w/o MKLDNN. [WIP] Fix transposed convolution in CPU w/o MKLDNN. Jan 31, 2019
@apeforest
Copy link
Contributor Author

apeforest commented Jan 31, 2019

Just noticed that test_deconvolution in unit test has been disabled for a long time and was not re-enabled although the related issue #10973 is marked resolved. Re-enable test_deconvolution in PR: #14032. New test will be added with this PR to make it complete.

@apeforest
Copy link
Contributor Author

@zhreshold @thomelane Please help to review. Thanks!

@apeforest apeforest changed the title [WIP] Fix transposed convolution in CPU w/o MKLDNN. Fix transposed convolution in CPU w/o MKLDNN. Jan 31, 2019
@apeforest apeforest changed the title Fix transposed convolution in CPU w/o MKLDNN. [WIP] Fix transposed convolution in CPU w/o MKLDNN. Jan 31, 2019
@zhreshold
Copy link
Member

Can you verify the result in unittest?

@apeforest apeforest changed the title [WIP] Fix transposed convolution in CPU w/o MKLDNN. Fix transposed convolution in CPU w/o MKLDNN. Jan 31, 2019
@apeforest
Copy link
Contributor Author

@zhreshold unit test added.

@apeforest
Copy link
Contributor Author

@mxnet-label-bot add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Jan 31, 2019
@@ -226,22 +227,24 @@ class DeconvolutionOp {
CHECK_EQ(in_data.size(), expected);
CHECK_EQ(out_data.size(), 1U);
Stream<xpu> *s = ctx.get_stream<xpu>();
#if defined(__CUDACC__)
CHECK_EQ(s->blas_handle_ownership_, Stream<xpu>::OwnHandle)
<< "Must init CuBLAS handle in stream";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cuBLAS" is the official abbreviation :)

@@ -503,6 +503,40 @@ def test_deconv():
# layer = nn.Conv3DTranspose(16, (3, 3, 3), layout='NDHWC', in_channels=4)
# # check_layer_forward(layer, (1, 10, 10, 10, 4))

@with_seed()
def test_deconv_dilation():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since deconv is a really important OP, I suggest to visit the original deconv test cases and add dilation > 1 cases alongside the old tests. This ensures better coverage than this single test case.
Feel free to keep this unittest which LGTM as well.

Copy link
Contributor

@Vikas-kum Vikas-kum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Looks good.

@@ -485,7 +455,6 @@ class DeconvolutionOp {
DeconvolutionParam param_;
mshadow::Shape<2> shape_colunit_;
mshadow::Shape<3> shape_dstunit_;
index_t nstep_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please tell me why was this removed?

Copy link
Contributor Author

@apeforest apeforest Mar 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The col2im method does not support such step.

const index_t nbatch = data.size(0);
Tensor<xpu, 1, DType> workspace =
ctx.requested[deconv::kTempSpace].get_space_typed<xpu, 1, DType>(
Shape1(this->InitTemp(out.shape_, data.shape_)), s);
for (index_t i = 0; i < nbatch; i += nstep_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what was "nstep_" doing earlier? It would help understand the problem with the earlier code.

Copy link
Contributor Author

@apeforest apeforest Mar 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The col2im method does not support such step.


const index_t nbatch = data.size(0);
Tensor<xpu, 1, DType> workspace =
ctx.requested[deconv::kTempSpace].get_space_typed<xpu, 1, DType>(
Shape1(this->InitTemp(grad.shape_, data.shape_)), s);
for (index_t i = 0; i < nbatch; i += nstep_) {
const index_t step = std::min(nstep_, nbatch - i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again can you tell what was the purpose of "step" in the previous code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's used to convert multiple batch of image data into columns in the prevous library. However, it is not supported in the col2im method.

Copy link
Contributor

@access2rohit access2rohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes are consistent with the optimized way to perform Deconv operation on just CPU but I have some questions that will help me understand what was happening earlier and why was it that way. Rest your code is correct and precise. Good Work !

@anirudhacharya
Copy link
Member

@apeforest can you please rebase and resolve the merge conflicts?

@abhinavs95
Copy link
Contributor

@apeforest Could you please have a look at the CI failures?

@piyushghai
Copy link
Contributor

@apeforest Gentle ping...

@Roshrini
Copy link
Member

@apeforest Can you take a look at failing CI build?

@karan6181
Copy link
Contributor

@apeforest Could you please provide an updates on this PR about your progress and thoughts so that the other community members get help from this. Thanks!

@apeforest
Copy link
Contributor Author

@karan6181 The new function im2col has different signature and calling sequence from old col_unpack(). The changes fail in a few unit tests and I ended up re-implementing the operator itself. Given that current MKLDNN is default in CPU and it has no issue with Conv2DTranspose operator, I would like to treat this issue as lower priority and get it complete in a few weeks.

@piyushghai
Copy link
Contributor

@mxnet-label-bot Update[pr-work-in-progress]

@apeforest Can you look into the CI failures ?

@marcoabreu marcoabreu added pr-work-in-progress PR is still work in progress and removed pr-awaiting-review PR is waiting for code review labels Jun 7, 2019
@wkcn
Copy link
Member

wkcn commented Jul 10, 2019

@apeforest Hi! Any update in this PR? The PR is important: )

@PawelGlomski-Intel
Copy link
Contributor

@szha with #11203 fixed by #20292, I believe this PR can be closed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-work-in-progress PR is still work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.