Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pytorch istft #1

Open
faroit opened this issue Mar 14, 2019 · 15 comments
Open

Fix pytorch istft #1

faroit opened this issue Mar 14, 2019 · 15 comments

Comments

@faroit
Copy link
Owner

faroit commented Mar 14, 2019

continue the discussion from here: keunwoochoi/torchaudio-contrib#27 (comment)

@keunwoochoi I already modified the unpad, see here: https://github.com/faroit/stft-istft-experiments/blob/master/test_torch.py#L85, that didn't help, though

@keunwoochoi
Copy link

Ok I figured it out - it's windowing. My torch istft implementation assumes an istft matrix with a rectangular window (which is not intended).
In other words,

    stft_f = torch.stft(sig_t, n_fft=n_fft, hop_length=hop_length,
        # window=window_t, 
        center=True,
        normalized=False, 
        onesided=True,
        pad_mode='reflect'
    ).transpose(1, 2)

(see the commented,) i.e., without specifying window, we get a result like this.

test_torch --> test_torch 1.1528399e-07
test_scipy --> test_torch 0.21650635
test_librosa --> test_torch 0.21650635
test_tf --> test_torch 0.21650644

@keunwoochoi
Copy link

Can't quickly find how to fix it though, gotta check later.

@tz579
Copy link

tz579 commented Apr 4, 2019

@faroit @keunwoochoi
Thanks a lot for your work!
Actually, if we use Hann window as analysis window in STFT, but use no window (or to say Rectangular window) as synthesis window in ISTFT, then we could also got the 1.2736353e-07 reconstruction error, exactly the same with the case you tried above, which is no window in STFT but with Hann window in ISTFT.
The problem comes from the way we use window function in this ISTFT implementation.
For using window function, i.e. synthesis window, in ISTFT, we need to follow the approach of a 1984 paper: https://ieeexplore.ieee.org/document/1164317
We can stimulate the implementation by going through a discussion on Stackexchange in 2012: https://dsp.stackexchange.com/questions/4771/least-squares-re-synthesis-of-short-time-fourier-transform (though there are several "misleading/distracting" comments in the comment section below the question and the answer there).
I'm currently working on it, and would upload the modified python script when I'm done.

@faroit
Copy link
Owner Author

faroit commented Apr 4, 2019

@tz579 yes I know about this paper. I would love to see your implementation here. Maybe we can test drive it in this repo and than think about how we could help PyTorch to implemented the ISTFT...

@tz579
Copy link

tz579 commented Apr 4, 2019

@faroit
Thanks and I appreciate your work as well as @keunwoochoi 's, too.
Here is the version I have just implemented:
https://gist.github.com/tz579/7067976e1e78a9d0d0897adb446acb2c

I did a bunch of modification/simplification, in addition to the implementation of the synthesis window.
I have tested with your current setting in your testbench, with the results shown below:

test_torch --> test_torch 1.0049663e-07
test_torch --> test_scipy 8.236605493114531e-08
test_torch --> test_librosa 8.86663e-08
test_scipy --> test_torch 8.8000185e-08
test_scipy --> test_scipy 1.9032627208483898e-08
test_scipy --> test_librosa 3.6991832e-08
test_librosa --> test_torch 8.665219e-08
test_librosa --> test_scipy 1.212239100932319e-08
test_librosa --> test_librosa 3.428809e-08

Note that I did not take time to configure tensorflow yet. Also, for librosa verison, I shrank the data type from complex128/float64 to complex64/float32, to be comparable with other versions, and thus the results is a little bit different with that of yours in a previous post.

There is something more:
if we change window to Hamming/Blackman, or if we change the hop length from 1/2 n_fft to 1/4 n_fft, then the scipy version would induce significant errors when paired with non-scipy ones.
The scipy version is correct only if the hop length is 1/2 n_fft and the window is set to Hann or Bartlett.
I did not test windows other than Hann/Bartlett/Hamming/Blackman, as these four are the only ones supported in the torch version.

Thus, I attached all of the 4 files: test{torch, librosa, scipy}.py, as well as the main file test.py, as all of them are slightly or heavily modified.

The above results could be reproduced using the 4 files uploaded.

The only problem is that the error is still a little bit higher for pytorch version (3~4 times larger than non-torch ones), either for the "cross" or the "within" ones.

@faroit
Copy link
Owner Author

faroit commented Apr 4, 2019

@tz579 thanks for the update.

Note that I did not take time to configure tensorflow yet.

I've updated to conda environment, it should now correctly install tensorflow.

I did a bunch of modification/simplification, in addition to the implementation of the synthesis window.

that looks great. Could you add this as a PR to this repo?

@faroit
Copy link
Owner Author

faroit commented Apr 4, 2019

if we change window to Hamming/Blackman, or if we change the hop length from 1/2 n_fft to 1/4 n_fft, then the scipy version would induce significant errors when paired with non-scipy ones.
The scipy version is correct only if the hop length is 1/2 n_fft and the window is set to Hann or Bartlett.
I did not test windows other than Hann/Bartlett/Hamming/Blackman, as these four are the only ones supported in the torch version.

@tz579 this is a good point. I actually forgot to test for different hop sizes. I've updated the code and yes, I can confirm that higher overlaps will result in reconstruction errors.

While all of the implementations seem to be able to do perfect reconstruction for higher overlap than *2, the differences between the packages when in comes to how they handle the synthesis windows are quite big.

E.g. a hop of /4 of the n_fft does not lead to perfect reconstruction when applying (STFT->ISTFT) for scipy->librosa or librosa->scipy. Surprisingly it works for the torch->librosa.

@bmcfee do you have some insights of the differences for librosa and scipy for the ISTFT or how to get there by adopting the parameters?

See librosa

def stft(x, n_fft=2048, n_hopsize=1024, center=True, window='hann', dtype=np.complex64):
return librosa.core.stft(x, n_fft=n_fft, hop_length=n_hopsize, pad_mode='constant', center=center, window=window, dtype=dtype)
def istft(X, rate=44100, n_fft=2048, n_hopsize=1024, center=True, window='hann', dtype=np.float32):
return librosa.core.istft(X, hop_length=n_hopsize, center=center, window=window, dtype=dtype)

vs. scipy

def stft(x, n_fft=2048, n_hopsize=1024):
f, t, X = scipy.signal.stft(
x,
nperseg=n_fft,
noverlap=n_fft - n_hopsize,
padded=True,
)
return X * n_hopsize
def istft(X, rate=44100, n_fft=2048, n_hopsize=1024):
t, audio = scipy.signal.istft(
X / n_hopsize,
rate,
nperseg=n_fft,
noverlap=n_fft - n_hopsize,
boundary=True
)
return audio

@nils-werner I remember you spend some time understanding how scipy has implemented ISTFT. Are they doing something "non-standard" over there?

@tz579
Copy link

tz579 commented Apr 4, 2019

@faroit Thanks and I will make a PR once I'm ready to write out a brief description of the modifications.

@faroit @bmcfee @nils-werner

this is a good point. I actually forgot to test for different hop sizes. I've updated the code and yes, I can confirm that higher overlaps will result in reconstruction errors.

While all of the implementations seem to be able to do perfect reconstruction for higher overlap than *2, the differences between the packages when in comes to how they handle the synthesis windows are quite big.

E.g. a hop of /4 of the n_fft does not lead to perfect reconstruction when applying (STFT->ISTFT) for scipy->librosa or librosa->scipy. Surprisingly it works for the torch->librosa.

Also, scipy gives weird results on Blackman/Hamming, "normal" results only on Hann/Bartlett, as mentioned in the same post of mine.

It seems that scipy has some "non-canonical" implementation not only for stride/overlapping, but also for window functions?

@bmcfee
Copy link

bmcfee commented Apr 4, 2019

@bmcfee do you have some insights of the differences for librosa and scipy for the ISTFT or how to get there by adopting the parameters?

Not off the top of my head, no. Do you have a sense of where the errors pop up? Are they due to boundary effects, or do they pop up all the way through?

@tz579
Copy link

tz579 commented Apr 5, 2019

@faroit

A pull request has been made from my fork of your repo.

I have done several modifications to remove the unnecessary changes from my side, leaving only "useful" ones.

The only file that has significant modifications is test_torch.py, while test_scipy.py is also slightly modified to add functionality of choosing window type.
Thanks!

Edit on Apr. 08, 2019:

Pull request updated with error fixes, as well as better handling of input/output for optional batch dim, numpy/torch.tensor array types, and cpu/cuda device for torch.tensor.

Also, I can confirm that after your fix of test_scipy.py, scipy version can handle non-1/2 stride now.

However, scipy version still induce significant reconstruction error with Hamming window, just as what it has behaved all along our discussion.

@faroit
Copy link
Owner Author

faroit commented Apr 8, 2019

Not off the top of my head, no. Do you have a sense of where the errors pop up? Are they due to boundary effects, or do they pop up all the way through?

@bmcfee it was just a scale issue. I fixed it in 7242420. Now librosa and scipy have perfect reconstruction in any combination. Maybe you would like to update your gist?

@faroit
Copy link
Owner Author

faroit commented Apr 8, 2019

@tz579 thanks for your effort. Maybe we could bring in @f0k and @ksanjeevan...

@f0k the goal here is not to find a good API or to write a wrapper that works with all STFT flavors. Instead to find STFT/ISTFT parameters that would allow one to interchange the library. This is useful e.g. if you want to run the synthesis operation of the deep learning framework. I think it would be okay to find a working set of parameters like hann = half-overlap/quarter overlap which would work for most applications.

@f0k
Copy link

f0k commented Apr 8, 2019

the goal here is not to find a good API or to write a wrapper that works with all STFT flavors

Hey, am I too demanding in the torchaudio redesign? :)

Instead to find STFT/ISTFT parameters that would allow one to interchange the library.

In order to find good defaults for pytorch's istft? From what I've read in the thread, you can do good reconstructions with many combinations now? Sorry, I guess I haven't understood what the question is.

Just a brief note that might be relevant, but might be known to you anyway:

>>> np.allclose(np.hamming(k), torch.hamming_window(k))
False
>>> np.allclose(np.hamming(k), torch.hamming_window(k, periodic=False))
True

@faroit
Copy link
Owner Author

faroit commented Apr 19, 2019

@tz579 any update on hopsize != win/2? Did you try the updated deconv implementation from @seungwonpark?

@tz579
Copy link

tz579 commented Apr 19, 2019

@faroit

For non-1/2 hopsize issue, please look at my previous post to you, where I have confirmed that your modification has resolved this problem for Scipy.

For deconv implementation by seungwonpark, I have tried an older version but not the newest one, since it does not fit my need in my current project. However, for the version in the pull request I made to your project, I have made several acceleration attempt, mainly to reduce the levels of loops (now rfft function would only be called once during the whole process), and it has been tested to be significantly faster, from my project. You could try to merge my pull request in your private branch and try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants