-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update MelResNet #751
Update MelResNet #751
Conversation
Codecov Report
@@ Coverage Diff @@
## master #751 +/- ##
==========================================
- Coverage 89.24% 89.22% -0.02%
==========================================
Files 32 32
Lines 2519 2515 -4
==========================================
- Hits 2248 2244 -4
Misses 271 271
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM. I pointed out a few nits. Please mark as "ready for review" when ready.
torchaudio/models/_wavernn.py
Outdated
Florian Stimberg, Aaron van den Oord, Sander Dieleman, Koray Kavukcuoglu. arXiv:1802.08435, 2018. | ||
r"""ResNet block based on "Deep Residual Learning for Image Recognition" | ||
|
||
The paper link is https://arxiv.org/pdf/1512.03385.pdf. The input signal is spectrogram. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "is a spectrogram"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
I also updated this change in the _wavernn
class in #735
test/test_models.py
Outdated
n_batch = 2 | ||
n_time = 200 | ||
n_freq = 100 | ||
n_output = 256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the output shape changing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I ran the test, I tried a different input value to double check if the test passes. Do you think I need to change it back to n_output = 128
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for clarifying. Let's change it back to what it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks.
torchaudio/models/_wavernn.py
Outdated
Examples:: | ||
>>> resblock = _ResBlock(num_dims=128) | ||
Examples | ||
>>> resblock = _ResBlock() | ||
>>> input = torch.rand(10, 128, 512) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the example, let's make it clear we are discussing a spectrogram:
input = torch.rand(10, 128, 512) # a random spectrogram
output = resblock(input) # shape: (10, 128, 512)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/test_models.py
Outdated
@@ -36,18 +36,20 @@ def test_mfcc(self): | |||
class TestMelResNet(common_utils.TorchaudioTestCase): | |||
|
|||
def test_waveform(self): | |||
"""test the output dimensions after _MelResNet block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Validate the output dimensions of a _MelResNet block."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Please revert the n_output
value in the test, and .kernel_size
if changed just for experimenting
torchaudio/models/_wavernn.py
Outdated
Florian Stimberg, Aaron van den Oord, Sander Dieleman, Koray Kavukcuoglu. arXiv:1802.08435, 2018. | ||
r"""ResNet block based on "Deep Residual Learning for Image Recognition" | ||
|
||
The paper link is https://arxiv.org/pdf/1512.03385.pdf. The input signal is a spectrogram. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: after re-reading this, I would simply remove "The input signal is a spectrogram." since you are already describing the arguments below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I removed this sentence. Thanks.
Jlin tutorials quant
This is to update the variable names and docstring in #705
Stack:
Add MelResNet Block #705, #751Add Upsampling Block #724Add WaveRNN Model #735Add example pipeline with WaveRNN #749