-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix file descriptor exhaustion when streaming many files #121
Fix file descriptor exhaustion when streaming many files #121
Conversation
var vfs = require('vinyl-fs'),
exhaust = require('stream-exhaust'),
exhaust(vfs.src('/dir/with/ton/of/felix-the-cat-images/honest/*.png', {buffer: false})
.pipe(vfs.dest('/elsewhere/'))); without this change I get about 4085 files into the copy and the process dies, after change I can copy all 9000 images. |
@Klowner nice catch on the 1st part - I'm a little unclear on the 2nd part, maybe we should trace that back further to find the root cause why files are passed to that function more than once? |
I should have been two PRs I suppose. I'd split it out but I'm at a wedding It is strange that it happens twice, but the On Sat, Oct 31, 2015, 11:57 AM contra notifications@github.com wrote:
|
@Klowner I think it's creating a new read stream every time it goes through dest, maybe? |
b1032b3
to
4c7837f
Compare
It looks like I suppose the BOM stripping should only happen once though? |
@Klowner the idea behind that is that when dest consumes the stream it should restart it before passing the file back out (by creating a new read stream as it is writing it). I guess the problem is that it doesn't close the old stream before creating the new stream which means every dest call will double the # of open streams for a short period |
@contra ahh, in retrospect that makes a lot of sense. So Those presumably just get garbage collected? Or do they have to go into some sort of "sink"? |
My apologies if I'm being dumb, but I'm pretty sure I at least understand why there's this issue of file descriptor exhaustion. With Once the stream is finished, then all of those can be garbage collected and associated file descriptors are freed, but the issue I'm seeing when there are enough files passing through the stream to exhaust the ulimit or whatever, the node process is running out of file descriptors and ends up dying (albeit very silently, thanks to graceful-fs?). I implemented a simple |
717952c
to
ea6cbec
Compare
@Klowner maybe explore making dest not re-create the stream from the file system at all? in dest fork the file.contents stream into two read streams, start writing one to the hd then replace file.contents with the forked tail and pass it down the stream |
@Klowner just did some digging, this could be interesting to play with https://www.npmjs.com/package/graceful-fs-stream |
ea6cbec
to
f079c90
Compare
@contra using graceful-fs-stream seems to work great, no more file descriptor exhaustion. Unfortunately it introduces an issue with the futime stuff, since that expects the stream's fd to be available. I think I'll wait for the futime tickets to get closed and then see what I can do. |
@Klowner the implementation inside graceful-fs-stream looks pretty straight forward, however, it monkeypatches the fs module, which we should probably avoid. If you were to implement those as standalone helpers in vfs, that way we could have access to the file descriptors, etc. |
@phated eep, thank you for pointing that out 👍 |
@Klowner will you have a chance to update this PR to avoid using graceful-fs-stream? |
Could also fix graceful-fs-stream to stop monkeypatching globals, your choice on which will be easiest though 🌴 |
I hope to get to it soon, December is always insane busy for me. |
I'd actually love for this to land in graceful-fs proper with tests |
graceful-fs seems to have tests for this at https://github.com/isaacs/node-graceful-fs/blob/master/test/read-write-stream.js#L14-L31 I changed the |
I don't think we're doing anything wrong here, I'm pretty sure the difference is that in those tests the file streams are being opened, written to, and closed. As opposed to vinyl-fs's Using something like graceful-fs prevents the file streams from being "opened" until something needs to read the file contents, so we avoid the whole file descriptor pile-up issue. I'm going to dig into it here this evening if all goes as planned. 👍 |
@Klowner Thanks for your time helping look into this, btw 🙌 |
The sad realization that my initial "solution" using graceful-fs-stream was because it was monkeypatching global |
f079c90
to
8fc3dd2
Compare
|
@Klowner that looks super cool. A couple things I'd like to get the maintainer to do would be only rely on readable-stream and bump to 1.0. Could you open an issue over there asking if those were possible? |
Also, do we need to update the Writeable to use this too? |
I'll check with their maintainer. I don't think the writable needs to be On Mon, Dec 21, 2015, 3:13 PM Blaine Bublitz notifications@github.com
|
@phated just the clarify, you'd like |
relevant issue jpommerening/node-lazystream#2 |
aaaaaand I grossly misunderstood 😝 |
e044e85
to
4b51102
Compare
4b51102
to
c63a797
Compare
Strange, looks like the |
@Klowner you can increase the timeout with |
@Klowner Is there a way you can write a test for this or would it make the test suite insanely long running? |
The manual test I performed with 9000 144kb files (~1.3GB) actually runs pretty quick, but that's a ton of disk space. I suppose if we could get |
I added a test, the downside is that it's a little slow and consumes about 25MB of disk space to operate, but that personally doesn't seem too unreasonable. Without lazystream, |
According to Travis docs the non-zero return code should signal a test failure, so this test should work! Let me know if that's cool for you, @phated 👍 |
@Klowner you, sir, are awesome. Thanks for all the work you put into this patch. |
Fix file descriptor exhaustion when streaming many files
You bet, sorry it took me so long :) |
Fix file descriptor exhaustion when streaming many files
It appears as though streamFile was creating duplicate ReadStreams,
which somehow was causing the process to never free WriteStreams. Also
I've changed the removeListeners calls to unlisten to
complete
ratherthan
cb
, since they're never told to listen tocb
in the firstplace.
I suspect this is related to #56 ?