realloc(18446744039349813248) failed. #147

sjackman · 2017-07-11T00:19:28Z

Hi, John. I'm running mlr to calculate count,p25,p50,p75,mean,stddev of one integer column with three billion rows, one row per nucleotide of the human genome. It fails with the error message realloc(18446744039349813248) failed. The machine in question has 2.5 terabytes of RAM, so it should have enough RAM to hold the column in memory, about 24 GB at 8 bytes per row. Is the bug possibly caused by holding the number of rows in a 32-bit int rather than a 64-bit size_t?

❯❯❯ mlr --tsvlite stats1 -a count,p25,p50,p75,mean,stddev -f Depth foo.tsv
realloc(18446744039349813248) failed.
❯❯❯ wc -l abyss2.hg004.bx.as100.nm5.bam.mi.bx.molecule.size2000.bed.depth.tsv
❯❯❯ head foo.tsv
Rname	Pos	Depth
1046	1	0
1046	2	0
1046	3	0
1046	4	0
1046	5	0
1046	6	0
1046	7	0
1046	8	0
1046	9	0
❯❯❯ mlr --version
Miller 5.0.1

The text was updated successfully, but these errors were encountered:

sjackman · 2017-07-11T00:20:20Z

If it's relevant, I only really need the median and IQR.

johnkerl · 2017-07-11T20:39:17Z

Spot-on re 32-bit ints:

$ d2h 18446744039349813248
fffffff800000000

I definitely intend to handle things > 4GB but I've tested little; clearly missed a callsite.

sjackman · 2017-07-11T22:15:28Z

Thanks for looking into it, John!

johnkerl · 2017-07-12T21:55:33Z

@sjackman I found one spot, at least. Fix committed to head & passing regression tests.

Now to validate ...

mlr -n seqgen --stop 5000000000 then stats1 -a sum,count,min,max,p50 -f i

runs me out of RAM entirely on my laptop (not surprising since seqgen is non-streaming); I'll test on bigger hardware maybe tomorrow.

sjackman · 2017-07-12T22:12:14Z

Thanks for the quick fix, John! I appreciate it. I'll test it in August when I'm back from travels.

sjackman · 2017-07-12T22:14:46Z

To test without the non-streaming seqgen, you can use seq:

seq 5000000000 | mlr stats1 -a sum,count,min,max,p50 -f 1

johnkerl · 2017-07-16T02:27:04Z

@sjackman thanks! FWIW I ran out of RAM on my larger host too. (Your 2.5T hardware is impressive indeed.) Let me know how it works for you.

sjackman · 2017-07-18T22:36:08Z

Do you have an estimate of how much RAM you expect it to use?

sjackman · 2017-07-18T22:49:48Z

It looks like this command will take about 80 GB of RAM to run. It's using 8 GB at the 10% mark.

❯❯❯ seq 5000000000 | pv -pls 5000000000 | mlr stats1 -a sum,count,min,max,p50 -f 1
[======>                                                                   ] 10%
❯❯❯ top -p 167197
   PID USER      PR  NI    VIRT    RES  %CPU %MEM     TIME+ S COMMAND          
167197 sjackman  20   0  9.773g 7.975g 100.0  0.3  10:22.94 R mlr

sjackman · 2017-07-19T00:01:25Z

Still running at 75% now and 55 GB of memory usage. Looks promising.

jungle-boogie · 2017-07-19T01:53:59Z

That's one impressive machine - 2.5TB of RAM!

If miller works out, I think it deserves a little write up of how you're using it.

sjackman · 2017-07-19T08:19:41Z

Memory usage has levelled off at 91 GB. Now it's thinking hard.

sjackman · 2017-07-19T16:59:20Z

It worked! It took 2 hours elapsed time. Would there be any speed gains in multithreading parts of Miller?

❯❯❯ time sh -c 'seq 5000000000 | pv -pls 5000000000 | mlr stats1 -a sum,count,min,max,p50 -f 1'
1_sum=12500000002147352576.000000,1_count=5000000000,1_min=1,1_max=5000000000,1_p50=2500000001
7019.94user 1072.76system 2:05:20elapsed 107%CPU (0avgtext+0avgdata 468754832maxresident)k
7864inputs+24outputs (8major+105656880minor)pagefaults 0swaps

Does 468754832maxresident mean 469 GB of RAM?

johnkerl · 2017-07-19T17:26:34Z

2 hours is fast for that data size, I think -- given single-threaded execution.

Miller is single-threaded by design; a little command-line tool for those times when you don't want to bring out the big guns (hadoop or whatever).

My experience with this kind of processing over the years is that disk-reads and data-parsing take up the lion's share of the time & in-core computations are relatively small. So multi-threading helps a little but the disk is still single-threaded, as it were. :^/ So I kept the code single-threaded and simple.

If disk files can be split up across machines then there is some parallelism to be had, even for single-threaded programs like Miller. (I.e. run multiple instances of simple programs over files on multiple hosts.)

Mean, sum, count, min, max are easily distributable. Percentiles not so much. :^/

sjackman · 2017-07-19T19:41:24Z

Makes sense to me. Thank again for the quick fix, John!

sjackman · 2017-07-19T19:41:56Z

Is a stable release with this fix imminent? I'll update the Homebrew/Linuxbrew formula for Miller.

johnkerl · 2017-07-19T19:49:51Z

yeah now that you've verified it i'll cut a bugfix release, next few days. i usually update homebrew as part of the process; no need for you to duplicate that.

thanks @sjackman!!!

sjackman · 2017-07-19T19:53:38Z

Great. Thanks, John!

johnkerl · 2017-07-20T03:32:16Z

https://github.com/johnkerl/miller/releases/tag/v5.2.2

johnkerl · 2017-07-20T03:37:16Z

Homebrew/homebrew-core#15788

sjackman · 2017-07-22T08:19:33Z

Thanks, John!

johnkerl added active bug labels Jul 11, 2017

johnkerl added pending feedback to close and removed active labels Jul 16, 2017

sjackman closed this as completed Jul 19, 2017

johnkerl removed the pending feedback to close label Jul 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

realloc(18446744039349813248) failed. #147

realloc(18446744039349813248) failed. #147

sjackman commented Jul 11, 2017 •

edited

Loading

sjackman commented Jul 11, 2017

johnkerl commented Jul 11, 2017

sjackman commented Jul 11, 2017

johnkerl commented Jul 12, 2017 •

edited

Loading

sjackman commented Jul 12, 2017

sjackman commented Jul 12, 2017

johnkerl commented Jul 16, 2017

sjackman commented Jul 18, 2017

sjackman commented Jul 18, 2017

sjackman commented Jul 19, 2017

jungle-boogie commented Jul 19, 2017

sjackman commented Jul 19, 2017

sjackman commented Jul 19, 2017

johnkerl commented Jul 19, 2017

sjackman commented Jul 19, 2017

sjackman commented Jul 19, 2017

johnkerl commented Jul 19, 2017

sjackman commented Jul 19, 2017

johnkerl commented Jul 20, 2017

johnkerl commented Jul 20, 2017

sjackman commented Jul 22, 2017

realloc(18446744039349813248) failed. #147

realloc(18446744039349813248) failed. #147

Comments

sjackman commented Jul 11, 2017 • edited Loading

sjackman commented Jul 11, 2017

johnkerl commented Jul 11, 2017

sjackman commented Jul 11, 2017

johnkerl commented Jul 12, 2017 • edited Loading

sjackman commented Jul 12, 2017

sjackman commented Jul 12, 2017

johnkerl commented Jul 16, 2017

sjackman commented Jul 18, 2017

sjackman commented Jul 18, 2017

sjackman commented Jul 19, 2017

jungle-boogie commented Jul 19, 2017

sjackman commented Jul 19, 2017

sjackman commented Jul 19, 2017

johnkerl commented Jul 19, 2017

sjackman commented Jul 19, 2017

sjackman commented Jul 19, 2017

johnkerl commented Jul 19, 2017

sjackman commented Jul 19, 2017

johnkerl commented Jul 20, 2017

johnkerl commented Jul 20, 2017

sjackman commented Jul 22, 2017

sjackman commented Jul 11, 2017 •

edited

Loading

johnkerl commented Jul 12, 2017 •

edited

Loading