Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

realloc(18446744039349813248) failed. #147

Closed
sjackman opened this issue Jul 11, 2017 · 21 comments
Closed

realloc(18446744039349813248) failed. #147

sjackman opened this issue Jul 11, 2017 · 21 comments
Labels

Comments

@sjackman
Copy link
Contributor

sjackman commented Jul 11, 2017

Hi, John. I'm running mlr to calculate count,p25,p50,p75,mean,stddev of one integer column with three billion rows, one row per nucleotide of the human genome. It fails with the error message realloc(18446744039349813248) failed. The machine in question has 2.5 terabytes of RAM, so it should have enough RAM to hold the column in memory, about 24 GB at 8 bytes per row. Is the bug possibly caused by holding the number of rows in a 32-bit int rather than a 64-bit size_t?

❯❯❯ mlr --tsvlite stats1 -a count,p25,p50,p75,mean,stddev -f Depth foo.tsv
realloc(18446744039349813248) failed.
❯❯❯ wc -l abyss2.hg004.bx.as100.nm5.bam.mi.bx.molecule.size2000.bed.depth.tsv
❯❯❯ head foo.tsv
Rname	Pos	Depth
1046	1	0
1046	2	0
1046	3	0
1046	4	0
1046	5	0
1046	6	0
1046	7	0
1046	8	0
1046	9	0
❯❯❯ mlr --version
Miller 5.0.1
@sjackman
Copy link
Contributor Author

If it's relevant, I only really need the median and IQR.

@johnkerl
Copy link
Owner

Spot-on re 32-bit ints:

$ d2h 18446744039349813248
fffffff800000000

I definitely intend to handle things > 4GB but I've tested little; clearly missed a callsite.

@sjackman
Copy link
Contributor Author

Thanks for looking into it, John!

@johnkerl
Copy link
Owner

johnkerl commented Jul 12, 2017

@sjackman I found one spot, at least. Fix committed to head & passing regression tests.

Now to validate ...

mlr -n seqgen --stop 5000000000 then stats1 -a sum,count,min,max,p50 -f i

runs me out of RAM entirely on my laptop (not surprising since seqgen is non-streaming); I'll test on bigger hardware maybe tomorrow.

@sjackman
Copy link
Contributor Author

Thanks for the quick fix, John! I appreciate it. I'll test it in August when I'm back from travels.

@sjackman
Copy link
Contributor Author

To test without the non-streaming seqgen, you can use seq:

seq 5000000000 | mlr stats1 -a sum,count,min,max,p50 -f 1

@johnkerl
Copy link
Owner

@sjackman thanks! FWIW I ran out of RAM on my larger host too. (Your 2.5T hardware is impressive indeed.) Let me know how it works for you.

@sjackman
Copy link
Contributor Author

Do you have an estimate of how much RAM you expect it to use?

@sjackman
Copy link
Contributor Author

It looks like this command will take about 80 GB of RAM to run. It's using 8 GB at the 10% mark.

❯❯❯ seq 5000000000 | pv -pls 5000000000 | mlr stats1 -a sum,count,min,max,p50 -f 1
[======>                                                                   ] 10%
❯❯❯ top -p 167197
   PID USER      PR  NI    VIRT    RES  %CPU %MEM     TIME+ S COMMAND          
167197 sjackman  20   0  9.773g 7.975g 100.0  0.3  10:22.94 R mlr              

@sjackman
Copy link
Contributor Author

Still running at 75% now and 55 GB of memory usage. Looks promising.

@jungle-boogie
Copy link
Contributor

That's one impressive machine - 2.5TB of RAM!

If miller works out, I think it deserves a little write up of how you're using it.

@sjackman
Copy link
Contributor Author

Memory usage has levelled off at 91 GB. Now it's thinking hard.

@sjackman
Copy link
Contributor Author

It worked! It took 2 hours elapsed time. Would there be any speed gains in multithreading parts of Miller?

❯❯❯ time sh -c 'seq 5000000000 | pv -pls 5000000000 | mlr stats1 -a sum,count,min,max,p50 -f 1'
1_sum=12500000002147352576.000000,1_count=5000000000,1_min=1,1_max=5000000000,1_p50=2500000001
7019.94user 1072.76system 2:05:20elapsed 107%CPU (0avgtext+0avgdata 468754832maxresident)k
7864inputs+24outputs (8major+105656880minor)pagefaults 0swaps

Does 468754832maxresident mean 469 GB of RAM?

@johnkerl
Copy link
Owner

2 hours is fast for that data size, I think -- given single-threaded execution.

Miller is single-threaded by design; a little command-line tool for those times when you don't want to bring out the big guns (hadoop or whatever).

My experience with this kind of processing over the years is that disk-reads and data-parsing take up the lion's share of the time & in-core computations are relatively small. So multi-threading helps a little but the disk is still single-threaded, as it were. :^/ So I kept the code single-threaded and simple.

If disk files can be split up across machines then there is some parallelism to be had, even for single-threaded programs like Miller. (I.e. run multiple instances of simple programs over files on multiple hosts.)

Mean, sum, count, min, max are easily distributable. Percentiles not so much. :^/

@sjackman
Copy link
Contributor Author

Makes sense to me. Thank again for the quick fix, John!

@sjackman
Copy link
Contributor Author

Is a stable release with this fix imminent? I'll update the Homebrew/Linuxbrew formula for Miller.

@johnkerl
Copy link
Owner

yeah now that you've verified it i'll cut a bugfix release, next few days. i usually update homebrew as part of the process; no need for you to duplicate that.

thanks @sjackman!!!

@sjackman
Copy link
Contributor Author

Great. Thanks, John!

@johnkerl
Copy link
Owner

@johnkerl
Copy link
Owner

Homebrew/homebrew-core#15788

@sjackman
Copy link
Contributor Author

Thanks, John!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants