Optimize RDAIO #759

alyst · 2015-01-05T19:29:14Z

Another round of RDA performance polishing:

Make long R vectors support optional. Disable by default on 32-bit systems (it's unlikely many people work with big dataframes on 32-bit systems, and in all other cases iterating over Int64 variable just slows everything down), R does the same.
Get >20% improvement in time and memory by improving type inference of IO variable. Before:

[~]$ julia -e "using DataFrames; @time read_rda(\"test.RData\")"
elapsed time: 15.414509979 seconds (2433718088 bytes allocated, 5.44% gc time)

after

[~]$ julia -e "using DataFrames; @time read_rda(\"test.RData\")"
elapsed time: 12.278712969 seconds (1834821860 bytes allocated, 3.38% gc time)

from network (big-endian) IO stream to host (ntoh), not hton

improves type inference in atomic reads, performance and memory allocation

on 32-bit versions supporting long vectors might slow down RDA reading

coveralls · 2015-01-05T19:35:17Z

Coverage increased (+0.01%) when pulling e6fc839 on alyst:optimize_RDAIO into 0f21922 on JuliaStats:master.

garborg · 2015-01-06T00:54:38Z

Thanks, @alyst.

What did you find out on the NaN/NA swap we're seeing for ASCII data?

Optimize RDAIO

alyst · 2015-01-06T11:57:40Z

@garborg For ASCII/ASCIIhex output they just output NA if ISNAN(d) is true and that discards NA/NaN distinction, whereas for XDR/native binary formats they write the real as is. I guess it's a rather long standing R bug, but I didn't found any discussion/bug reports.

garborg · 2015-01-06T14:36:19Z

Ah, interesting, I understand the comment in the tests now. Thanks for expanding.

alyst · 2015-01-06T18:39:50Z

Filed R-project bug 16137.

garborg · 2015-01-06T18:45:44Z

Nice!

StefanKarpinski · 2015-01-07T20:49:05Z

It's kind of amazing that our issue counter is rapidly catching up to the R project.

Optimize RDAIO

alyst added 4 commits January 5, 2015 20:13

enable NA ASCII tests

6845c24

use the proper byte ordering function

ea79b06

from network (big-endian) IO stream to host (ntoh), not hton

parameterize RDAXXXIO to avoid abstract IO field

4d5bd63

improves type inference in atomic reads, performance and memory allocation

make long R vectors support optional

e6fc839

on 32-bit versions supporting long vectors might slow down RDA reading

garborg added a commit that referenced this pull request Jan 6, 2015

Merge pull request #759 from alyst/optimize_RDAIO

99be77f

Optimize RDAIO

garborg merged commit 99be77f into JuliaData:master Jan 6, 2015

nalimilan pushed a commit that referenced this pull request May 26, 2022

Merge pull request #759 from alyst/optimize_RDAIO

2eddbc8

Optimize RDAIO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize RDAIO #759

Optimize RDAIO #759

alyst commented Jan 5, 2015

coveralls commented Jan 5, 2015

garborg commented Jan 6, 2015

alyst commented Jan 6, 2015

garborg commented Jan 6, 2015

alyst commented Jan 6, 2015

garborg commented Jan 6, 2015

StefanKarpinski commented Jan 7, 2015

Optimize RDAIO #759

Optimize RDAIO #759

Conversation

alyst commented Jan 5, 2015

coveralls commented Jan 5, 2015

garborg commented Jan 6, 2015

alyst commented Jan 6, 2015

garborg commented Jan 6, 2015

alyst commented Jan 6, 2015

garborg commented Jan 6, 2015

StefanKarpinski commented Jan 7, 2015