Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WFDB psf_fseek bug happening again #72

Closed
briangow opened this issue Jun 3, 2024 · 6 comments
Closed

WFDB psf_fseek bug happening again #72

briangow opened this issue Jun 3, 2024 · 6 comments
Assignees

Comments

@briangow
Copy link
Collaborator

briangow commented Jun 3, 2024

In #63 the soundfile version was changed in an attempt to get around a psf_fseek error. This error is occurring again. This time is happening with a WFDB file from a GE ICU monitor (with soundfile==0.11.0 or soundfile==0.10.2):

./waveform_benchmark.py -r /home/briangow/chorus/data/waveform_benchmark/waveforms/GE_WFDB/A004-0502133511/A004-0502133511.hea -f waveform_benchmark.formats.wfdb.WFDBFormat516

________________________________________________________________
Read performance (median of N trials):
 #seek  #read      KiB      sec     [N]
 70126  44259   353628  15.2345     [3] read 1 x 436270s, all channels
   720    383     4348   0.1067    [68] read 5 x 500s, all channels
  3610   1524    24004   0.4193    [17] read 50 x 50s, all channels
 36894  14112   200724   4.2547     [3] read 500 x 5s, all channels
 70126  44259   353628  10.2673     [3] read 1 x 436270s, one channel
Traceback (most recent call last):
  File "/home/briangow/chorus/repos/chorus_waveform/./waveform_benchmark.py", line 6, in <module>
    waveform_benchmark.__main__.main()
  File "/home/briangow/chorus/repos/chorus_waveform/waveform_benchmark/__main__.py", line 122, in main
    run_benchmarks(input_record=opts.input_record,
  File "/home/briangow/chorus/repos/chorus_waveform/waveform_benchmark/benchmark.py", line 234, in run_benchmarks
    fmt().read_waveforms(path, t0, t1, [c])
  File "/home/briangow/chorus/repos/chorus_waveform/waveform_benchmark/formats/wfdb.py", line 90, in read_waveforms
    record = wfdb.rdrecord(path, sampfrom=start_frame, sampto=end_frame,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/briangow/chorus/repos/chorus_waveform/env/lib/python3.11/site-packages/wfdb/io/record.py", line 2113, in rdrecord
    record.e_d_signal = _signal._rd_segment(
                        ^^^^^^^^^^^^^^^^^^^^
  File "/home/briangow/chorus/repos/chorus_waveform/env/lib/python3.11/site-packages/wfdb/io/_signal.py", line 1207, in _rd_segment
    datsignals = _rd_dat_signals(
                 ^^^^^^^^^^^^^^^^
  File "/home/briangow/chorus/repos/chorus_waveform/env/lib/python3.11/site-packages/wfdb/io/_signal.py", line 1335, in _rd_dat_signals
    data_to_read = _rd_compressed_file(
                   ^^^^^^^^^^^^^^^^^^^^
  File "/home/briangow/chorus/repos/chorus_waveform/env/lib/python3.11/site-packages/wfdb/io/_signal.py", line 1882, in _rd_compressed_file
    sf.seek(start_samp + sample_offset)
  File "/home/briangow/chorus/repos/chorus_waveform/env/lib/python3.11/site-packages/soundfile.py", line 799, in seek
    _error_check(self._errorcode)
  File "/home/briangow/chorus/repos/chorus_waveform/env/lib/python3.11/site-packages/soundfile.py", line 1404, in _error_check
    raise LibsndfileError(err, prefix=prefix)
soundfile.LibsndfileError: Internal psf_fseek() failed.
@bemoody
Copy link
Collaborator

bemoody commented Jun 3, 2024

Unfortunately not a bug in soundfile/libsndfile, this looks like a bug in libFLAC.

flac --test says the file is OK.

rdsamp -r wavetest -f s76075827 # OK
rdsamp -r wavetest -f s76075828 # FAIL
rdsamp -r wavetest -f s76079511 # FAIL
rdsamp -r wavetest -f s76079512 # OK

flac -df -o /dev/null wavetest.dat --skip=76075826 # OK
flac -df -o /dev/null wavetest.dat --skip=76075827 # FAIL
flac -df -o /dev/null wavetest.dat --skip=76079510 # FAIL
flac -df -o /dev/null wavetest.dat --skip=76079511 # OK

@bemoody
Copy link
Collaborator

bemoody commented Jun 3, 2024

Oof. As I unfortunately suspected, it looks like libFLAC is hitting a false sync code (that occurs by chance within the encoded data.)

This is something I worried about when I was studying the FLAC format some years ago, but I tried and wasn't able to cause such an error even intentionally.

And... guess what? The problem here doesn't seem to occur with libFLAC 1.3.3. Apparently there were some changes to the sync algorithm in libFLAC 1.4.0, and where the old version correctly ignored the spurious sync code, the new version throws an error. :(

(Seeking to sample 76079510 jumps to byte 274485255; seeking to sample 76079511 jumps to byte 274485260. The false sync code is at byte 274485259.)

Running a test now but I suspect the problem won't happen with libFLAC 1.3.3.

One way we could work around this is to add a seektable (which increases the file size slightly, and should also improve random I/O performance.) This can be done using the metaflac command-line tool (e.g.: metaflac --add-seekpoint=1s wavetest.dat). I don't know if there's a way to do this using soundfile/libsndfile.

@bemoody
Copy link
Collaborator

bemoody commented Jun 3, 2024

Running a test now but I suspect the problem won't happen with libFLAC 1.3.3.

FWIW, waveform_benchmark passes with libFLAC 1.3.3-2+deb11u2 (and libsndfile 1.0.31-2), but it also gives different and less efficient compression (397M bytes versus 362M), so not really conclusive.

@briangow
Copy link
Collaborator Author

briangow commented Jun 3, 2024

Thanks @bemoody . It seems reasonable to me to use your work-around for now given that we'd like to run benchmarking on FLAC compressed WFDB files soon.

Could you explain how to update the files / system / env for this work-around?

@bemoody
Copy link
Collaborator

bemoody commented Jun 4, 2024

It seems like the bug might be fixed in the latest libFLAC.

Tested: "can we perform random seeks to every sample number between 76075000 and 76079600?"

  • 1.3.3: working
  • 1.3.4: working
  • Broken by commit cbb039d2d629885a943a620f9828fdccec9cf28c
  • 1.4.0: broken
  • 1.4.1: broken
  • 1.4.2: broken
  • Fixed by commit 46bf04d0d74440a12be9508af508b4d373d696cf
  • 1.4.3: working

Compare MIT-LCP/wfdb-python#486 where I found that:

  • version A (using flac 1.3.3) was broken
  • version B (using flac 1.3.3) was broken
  • version C (using flac 1.4.2) was working
  • version D (using flac 1.4.3) was working

So it looks like we have two separate problems, but hopefully both are now fixed in libFLAC upstream, and should be fixed in the next version of soundfile.

For now I've installed the Debian unstable package (1.4.3) on the wavebench server.

@briangow
Copy link
Collaborator Author

briangow commented Jun 5, 2024

The WFDB compressed format is running without error on all of the files in the benchmarking waveform suite when using the updated package mentioned in the post above. For the purposes of this project, I will close this issue. It is being tracked in MIT-LCP/wfdb-python#488 .

@briangow briangow closed this as completed Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants