FIX: Accept any valid delimiters/EOF markers in TCK files #720

soichih · 2019-01-27T21:54:54Z

According to mrtrix/tck specification.

The binary track data themselves are stored as triplets of floating-point values (at this stage in 32 bit floating-point format), one per vertex along the track. Tracks are separated using a triplet of NaN values. Finally, a triplet of Inf values is used to indicate the end of the file.

https://mrtrix.readthedocs.io/en/latest/getting_started/image_data.html#tracks-file-format-tck

The current implementation of streamlines/tck.py / _read() expects that these delimiters are in specific binary format; whatever numpy.nan and numpy.inf happens to be). However, according to IEEE standard, NaN/Inf can have many different binary representations.

EEE 754 NaNs are represented with the exponent field filled with ones (like infinity values), and some non-zero number in the significand (to make them distinct from infinity values);

https://en.wikipedia.org/wiki/NaN

Therefore, streamlines generated by other programming languages (such as Matlab) can store these NaN and Inf with a slightly different binary format, and nibabel.streamlines.tck fails to detect these delimieters which results in the following error

brainlife/app-convert-tck-to-trk#1 (comment)

I have re-implemented _read() so that it will actually look at each values to see if they are NaN values to determine if the values are stream line delimiters. My code runs around 30% slower (4.7sec v.s. 3.6sec for loading 500k streamlines) but I believe proper checking of NaN/Inf are unavoidable so that this code can load streamlines generated by other programming languages (or numpy might change the definition for numpy.nan and numpy.inf in the future)

for fiber_delimiter and eof_delimiter.

matthew-brett · 2019-01-28T10:14:32Z

Thanks for this.

Can you add a test, maybe with a currently failing small trk file?

Can you think of a way of speeding up again? Can we first check what the binary format is, then use that in subsequent reads from the same file?

soichih · 2019-01-28T17:39:15Z

@matthew-brett Yes, I can try adding a unit test. Currently though.. "make test" takes a really long time.. (it's stuck on 99% CPU usage for >30 min when I try it) Is it normal?

Can we first check what the binary format is, then use that in subsequent reads from the same file?

Maybe.. but I think it's a bit risky.. tck file could be generated by concatenating different .tck files from different sources, so depending on the implementation, it could end up with having a mix of different binary formats within a single file (I can think of a case where someone tries to by-pass the same obj. buffer that's loaded from a file and appended it to the new aggregated file)

Using a proper float32 NaN/Inf function is slower, but loading 600K fiber would only take a few seconds so I wouldn't worry too much about optimizing this. I think it's best to do this in the right way (and simpler); looking for a specific binary pattern for NaN/Inf or any floating decimal values for that matter is a bit iffy.

I've attached a sample .tck file that fails to load with the current implementation.

matlab.zip

effigies

A couple comments that may explain at least some of the slowdown. I think you're accidentally coercing to float64, and doing many more appends than necessary.

nibabel/streamlines/tck.py

soichih · 2019-01-28T22:03:06Z

@effigies Thanks for the suggestion. I've applied a few of your suggestions and now my code runs about 5% faster compared to the current production version (instead of slower). I assumed that numpy.append with an empty array is a no-op, but I guess I was wrong.

I've kept most of my original code, as your version doesn't handle the EOF delimiter checking correctly. I think there is a way to make it work.. but I thought mine is a bit simpler. I still can't get the unittest to work, but I've tested my code with various input files I had and made sure that it loads the correct number of fibers/streams.

matthew-brett · 2019-01-29T09:49:12Z

For some reason I can't get the Travis-CI page with the test failures - can you describe them?

Can you add a small test file written by Matlab to confirm this fixes the read?

different binary format from the current numpy.nan and numpy.inf

soichih · 2019-01-30T02:26:48Z

@matthew-brett I've added matlab_nan.tck in tests/data. Please let me know if there is anything else I should do.

effigies · 2019-01-31T03:52:15Z

The test failures occur because numpy expects a real file object in fromfile, which fails if we have any stream that isn't that. I will try a bytesarray fix...

RF: Use bytearray/frombuffer and other minor fixes

coveralls · 2019-01-31T04:45:55Z

Coverage increased (+0.009%) to 91.821% when pulling 486bbb2 on soichih:master into ad6b890 on nipy:master.

FIX: Restore missing delimiter error message

codecov-io · 2019-01-31T15:38:29Z

Codecov Report

Merging #720 into master will increase coverage by <.01%.
The diff coverage is 90.47%.

@@            Coverage Diff            @@
##           master    #720      +/-   ##
=========================================
+ Coverage    89.1%   89.1%   +<.01%     
=========================================
  Files          93      93              
  Lines       11468   11469       +1     
  Branches     1991    1990       -1     
=========================================
+ Hits        10218   10220       +2     
+ Misses        911     910       -1     
  Partials      339     339

Impacted Files	Coverage Δ
nibabel/cifti2/cifti2.py	`96.38% <0%> (ø)`	⬆️
nibabel/streamlines/tck.py	`99.45% <95%> (+0.54%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad6b890...486bbb2. Read the comment docs.

effigies

LGTM. Thanks for tolerating my refactors. One tiny thing I failed to clean up.

@matthew-brett @MarcCote Do one or both of you have time for a quick look? In particular I'd appreciate a check that the logic of the DataError matches the original intent; I tried to replicate the cases that would produce each message, but the structure changed a bit.

nibabel/streamlines/tck.py

matthew-brett · 2019-01-31T17:23:41Z

@effigies - sorry - I'm completely overwhelmed with work at the moment, will be until Wednesday or so. @MarcCote - do you mind having a look?

effigies · 2019-01-31T17:40:29Z

Thanks for the heads up Matthew.

MarcCote · 2019-01-31T18:45:53Z

Thanks for pinging me. I'm not explicitly watching Nibabel repo anymore but I'm happy to help whenever I can. I'll have a look tonight.

MarcCote · 2019-02-01T15:25:16Z

nibabel/streamlines/tck.py

+                        n_streams += 1
+                    begin = delim + 1
+
+                # The rest gets appended to the leftover


Suggested change

# The rest gets appended to the leftover

# The rest becomes the new leftover.

Could you apply this suggestion, as well?

MarcCote

This looks good to me. Thank you @soichih.

effigies · 2019-02-01T15:32:33Z

Thanks, @MarcCote. @soichih If you have a couple minutes to apply the two suggestions, we can go ahead and merge ASAP.

soichih · 2019-02-04T05:01:47Z

@effigies Did I do it right? I believe I've applied @MarcCote 's suggestions.

effigies · 2019-02-04T14:58:02Z

Great! Thanks!

fixed a bug where tck _read() is expecting specific binary expression

a841055

for fiber_delimiter and eof_delimiter.

effigies reviewed Jan 28, 2019

View reviewed changes

nibabel/streamlines/tck.py Outdated Show resolved Hide resolved

nibabel/streamlines/tck.py Outdated Show resolved Hide resolved

optimized performance (now it's 4% faster than the original code)

cd2a2b9

added matlab_nan.tck, a test tck file containing NaN and Inf in a

62c0cda

different binary format from the current numpy.nan and numpy.inf

effigies and others added 7 commits January 30, 2019 23:05

RF: Reduce concatenations further, moderate cleanups

8ed4aca

FIX: Return to bytearray/frombuffer approach

74c3410

FIX: Check final delimiter is ONLY infs

10ae9bc

TEST: Simple load test for matlab_nan.tck

0a75431

Merge pull request #1 from effigies/fix/streamlines_infnan

a848bb4

RF: Use bytearray/frombuffer and other minor fixes

STY: Reduce diff

676df4d

RF: Restore missing streamline delimiter error

4e1cab2

effigies and others added 2 commits January 30, 2019 23:49

STY: Pacify flake8

196f13a

Merge pull request #2 from effigies/fix/streamlines_infnan

b98ba4f

FIX: Restore missing delimiter error message

effigies approved these changes Jan 31, 2019

View reviewed changes

nibabel/streamlines/tck.py Outdated Show resolved Hide resolved

effigies changed the title ~~fixed a bug where streamlines/tck can't read .tck generated by matlab~~ FIX: Accept any valid delimiters/EOF markers in TCK files Jan 31, 2019

MarcCote reviewed Feb 1, 2019

View reviewed changes

MarcCote approved these changes Feb 1, 2019

View reviewed changes

soichih added 2 commits February 4, 2019 00:05

reapplied @MarcCote's suggestion.

bac1988

applied another @MarcCote suggestion

486bbb2

effigies merged commit 4b6ca81 into nipy:master Feb 4, 2019

effigies added this to the 2.4.0 milestone Mar 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Accept any valid delimiters/EOF markers in TCK files #720

FIX: Accept any valid delimiters/EOF markers in TCK files #720

soichih commented Jan 27, 2019

matthew-brett commented Jan 28, 2019

soichih commented Jan 28, 2019

effigies left a comment

soichih commented Jan 28, 2019

matthew-brett commented Jan 29, 2019

soichih commented Jan 30, 2019

effigies commented Jan 31, 2019

coveralls commented Jan 31, 2019 •

edited

Loading

codecov-io commented Jan 31, 2019 •

edited

Loading

effigies left a comment

matthew-brett commented Jan 31, 2019

effigies commented Jan 31, 2019

MarcCote commented Jan 31, 2019

MarcCote Feb 1, 2019 •

edited

Loading

effigies Feb 4, 2019

soichih Feb 4, 2019

MarcCote left a comment

effigies commented Feb 1, 2019

soichih commented Feb 4, 2019

effigies commented Feb 4, 2019

	# The rest gets appended to the leftover
	# The rest becomes the new leftover.

FIX: Accept any valid delimiters/EOF markers in TCK files #720

FIX: Accept any valid delimiters/EOF markers in TCK files #720

Conversation

soichih commented Jan 27, 2019

matthew-brett commented Jan 28, 2019

soichih commented Jan 28, 2019

effigies left a comment

Choose a reason for hiding this comment

soichih commented Jan 28, 2019

matthew-brett commented Jan 29, 2019

soichih commented Jan 30, 2019

effigies commented Jan 31, 2019

coveralls commented Jan 31, 2019 • edited Loading

codecov-io commented Jan 31, 2019 • edited Loading

Codecov Report

effigies left a comment

Choose a reason for hiding this comment

matthew-brett commented Jan 31, 2019

effigies commented Jan 31, 2019

MarcCote commented Jan 31, 2019

MarcCote Feb 1, 2019 • edited Loading

Choose a reason for hiding this comment

effigies Feb 4, 2019

Choose a reason for hiding this comment

soichih Feb 4, 2019

Choose a reason for hiding this comment

MarcCote left a comment

Choose a reason for hiding this comment

effigies commented Feb 1, 2019

soichih commented Feb 4, 2019

effigies commented Feb 4, 2019

coveralls commented Jan 31, 2019 •

edited

Loading

codecov-io commented Jan 31, 2019 •

edited

Loading

MarcCote Feb 1, 2019 •

edited

Loading