Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix LSTM and GRU layers gradient calculations #18203

Merged
merged 6 commits into from
May 14, 2020

Conversation

bgawrych
Copy link
Contributor

Description

Fix for LSTM and GRU layers without DNNL enabled give wrong gradients #17898

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@mxnet-bot
Copy link

Hey @bgawrych , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [miscellaneous, sanity, clang, website, windows-cpu, centos-gpu, centos-cpu, unix-cpu, unix-gpu, edge, windows-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@zixuanweeei
Copy link
Contributor

Thanks for your contribution. CC @pengzhao-intel @TaoLv @ciyongch

@bgawrych Could you please rename the title to a more concise and specific one? When the PR merged, the title will become the commit message. Thanks.

@bgawrych
Copy link
Contributor Author

Thanks for your contribution. CC @pengzhao-intel @TaoLv @ciyongch

@bgawrych Could you please rename the title to a more concise and specific one? When the PR merged, the title will become the commit message. Thanks.

Of course, I overlooked this. Sorry :)

@bgawrych bgawrych changed the title Rnn fix Fix LSTM and GRU layers gradient calculations Apr 30, 2020
@pengzhao-intel
Copy link
Contributor

Thanks @bgawrych yes, we need to cherry pick this to 1.x.

@pengzhao-intel
Copy link
Contributor

@zixuanweeei @ciyongch please help take a review :)

Copy link
Contributor

@zixuanweeei zixuanweeei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pengzhao-intel
Copy link
Contributor

@bgawrych let me know when all internal cases and performance test passed and then I will merge the PR

@@ -594,6 +595,10 @@ void LstmBackward(DType* ws,
x, hx[idx], cx[idx], y, dy, dx, dhx[idx], dcx[idx],
dhy_cur_ptr, dcy_cur_ptr, w_cur_ptr, dw_cur_ptr, db_cur_ptr,
req_data, req_params, req_state, req_statecell);

// Prevent overwritting dy while calculating dx in left2right layer
const int loop_iteration = (L - 1) - i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an additional case which contains even number of layers in UT (the current UT is only covering 1 and 3 layer(s))?

@bgawrych bgawrych force-pushed the rnn_fix branch 3 times, most recently from 4fab561 to 34d8947 Compare May 12, 2020 09:13
For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.
For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.
@bgawrych
Copy link
Contributor Author

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu]

@bgawrych
Copy link
Contributor Author

@pengzhao-intel
I think it's ready. I don't see any performance drops in internal CI and all test passed.

@TaoLv
Copy link
Member

TaoLv commented May 13, 2020

@zixuanweeei @ciyongch @pengzhao-intel Please take another look at the new commits and also make sure your comments are addressed. Thanks!

Copy link
Contributor

@ciyongch ciyongch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bgawrych for the contribution :)
LGTM

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, Bart, merging now.

@pengzhao-intel pengzhao-intel merged commit b4c70eb into apache:master May 14, 2020
@pengzhao-intel
Copy link
Contributor

Please also cherry-pick to 1.x branch.

@ciyongch
Copy link
Contributor

@bgawrych , please cherry-pick to both v1.7.x and v1.x branch (as these two branches are diverged now), so that this patch will be included in the upcoming 1.7.0 release.

bgawrych added a commit to bgawrych/incubator-mxnet that referenced this pull request May 18, 2020
…pache#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers
bgawrych added a commit to bgawrych/incubator-mxnet that referenced this pull request May 18, 2020
…che#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers
bgawrych added a commit to bgawrych/incubator-mxnet that referenced this pull request May 27, 2020
…che#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers
bgawrych added a commit to bgawrych/incubator-mxnet that referenced this pull request May 28, 2020
…che#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers
bgawrych added a commit to bgawrych/incubator-mxnet that referenced this pull request May 28, 2020
…che#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers
bgawrych added a commit to bgawrych/incubator-mxnet that referenced this pull request May 29, 2020
…pache#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers
bgawrych added a commit to bgawrych/incubator-mxnet that referenced this pull request Jun 1, 2020
…che#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers
bgawrych added a commit to bgawrych/incubator-mxnet that referenced this pull request Jun 2, 2020
…pache#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers
pengzhao-intel pushed a commit that referenced this pull request Jun 3, 2020
…8316)

* [Large Tensor] Backport of Fixed RNN op (#17632)

* Changed relevant function args to index_t

* Added nightly test for RNN

* Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh

* Using const instead of literals

* Added nightly test for RNN ReLU & tanh, LSTM, GRU

* Type assertion to force evaluation of output NDArray

* Incorporated latest round of comments

* [v1.7.x] Backport of Fix LSTM and GRU layers gradient calculations (#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers

Co-authored-by: Connor Goggins <cgoggins0@gmail.com>
pengzhao-intel pushed a commit that referenced this pull request Jun 3, 2020
* [v1.x] [Large Tensor] Backport of Fixed RNN op (#17632)

* Changed relevant function args to index_t

* Added nightly test for RNN

* Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh

* Using const instead of literals

* Added nightly test for RNN ReLU & tanh, LSTM, GRU

* Type assertion to force evaluation of output NDArray

* Incorporated latest round of comments

* [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers

Co-authored-by: Connor Goggins <cgoggins0@gmail.com>
AntiZpvoh pushed a commit to AntiZpvoh/incubator-mxnet that referenced this pull request Jul 6, 2020
* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers
ChaiBapchya pushed a commit to ChaiBapchya/mxnet that referenced this pull request Aug 15, 2020
…17632) (apache#18317)

* [v1.x] [Large Tensor] Backport of Fixed RNN op (apache#17632)

* Changed relevant function args to index_t

* Added nightly test for RNN

* Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh

* Using const instead of literals

* Added nightly test for RNN ReLU & tanh, LSTM, GRU

* Type assertion to force evaluation of output NDArray

* Incorporated latest round of comments

* [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (apache#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers

Co-authored-by: Connor Goggins <cgoggins0@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants