Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use minimatch to handle negative globs, fixes #2; Run positive globs in parallel, fixes #3 #6

Closed
wants to merge 2 commits into from

Conversation

UltCombo
Copy link
Contributor

Fixes #2
Fixes #3

Negative globs handling is over 2.5x faster now in regards to my previous benchmark.

…in parallel, fixes #3;

Negative globs handling is over 2.5x faster now in regards to my previous benchmark.
@UltCombo
Copy link
Contributor Author

Here's a diagram of the control flow this PR implements:

globby

@sindresorhus PTAL when you have time.

@UltCombo UltCombo force-pushed the perf branch 2 times, most recently from c35f7db to 4c4a126 Compare December 30, 2014 23:04
@UltCombo
Copy link
Contributor Author

(moved benchmarking to #10)

Benchmarks before this PR:

image

Benchmarks after this PR:

image

Looks like the negative glob perf improvement is not visible in this benchmark because all the negated files are in the same directory, which is far less consuming than walking deeply nested directories (as node_modules/** would). Guess I'll need to tweak the benchmark.

@UltCombo
Copy link
Contributor Author

The benchmarks above were using 100 negated files. It looks like the minimatch approach is only faster when there are > 500 negated files (my initial test before proposing this PR was using 11,000 negated files).

@UltCombo
Copy link
Contributor Author

Improved tests:
image

It is odd how the current master's negative globs handling is faster in some cases -- node-glob reads from disk and matches using minimatch, it should always be slower than just using minimatch directly. Perhaps it is because they use the Minimatch class.

@UltCombo
Copy link
Contributor Author

(Travis data in comment above was wrong, removed it)

And yes, using the Minimatch class is faster, now the perf branch is always faster than master:

image

You need to merge/rebase the bench branch onto perf to run the bench.

@UltCombo
Copy link
Contributor Author

@sindresorhus do you want to do some review (code style validation, etc.) or should I just merge?

@UltCombo
Copy link
Contributor Author

Sorry for the spam of screenshots, the only important one is the latest one, the rest are outdated.

@UltCombo UltCombo closed this in ade3a55 Dec 31, 2014
@sindresorhus
Copy link
Owner

Reviewed. Superb work. And thanks for including the graph, nice touch!

687474703a2f2f7777772e7265616374696f6e676966732e636f6d2f77702d636f6e74656e742f67616c6c6572792f64616e63652d70617274792f74756d626c725f6d306f73727268714d6b31717a6376376e6f345f3235302e676966

@sindresorhus sindresorhus deleted the perf branch December 31, 2014 14:48
@sindresorhus
Copy link
Owner

🎆 And happy NYE! 🎆

@sindresorhus
Copy link
Owner

Wonder if there's more opportunities for perf improvements. Probably not, but the bench makes it a lot easier to explore, thanks for that :)

UltCombo added a commit that referenced this pull request Dec 31, 2014
@sindresorhus
Copy link
Owner

Weird. I'm seeing different results for globby async (working directory) and globby async (upstream/master) even though both are really master.

             354 op/s » globby async (working directory)
             384 op/s » globby async (upstream/master)
             283 op/s » globby sync (working directory)
             281 op/s » globby sync (upstream/master)

The sync version seems more stable though.

@UltCombo
Copy link
Contributor Author

@sindresorhus Happy new year to you too. :D

I'm seeing different results

That's odd indeed. Did you update the globby dependency after merging?

rm -rf node_modules && npm i

Ah, and last time I checked, npm had a caching issue -- when pulling from a GitHub repo, it would cache the repository's package.json version as if it was the real version release, then next time npm install pulls from the same GitHub repo, it would see the package.json version is still the same and use the old cached version. I'm not sure if this is fixed, but do a npm cache clean globby just to be sure.

@UltCombo
Copy link
Contributor Author

UltCombo commented Jan 1, 2015

@sindresorhus

Wonder if there's more opportunities for perf improvements.

Oh, I just found one more small optimization we can make (see gulpjs/glob-stream#33), but it shouldn't matter much. I'll push it to globby later if it doesn't make the code too ugly.

I believe the next big optimization would be to use node-glob's ignore API, but that is not implemented yet. ;)

As for benchmarking, I'll open an issue to discuss some possible improvements.

@UltCombo
Copy link
Contributor Author

UltCombo commented Jan 1, 2015

e402d9f

@sindresorhus
Copy link
Owner

Did you update the globby dependency after merging?

Yes, ofc. npm caches branches so you kinda have to do it.

I believe the next big optimization would be to use node-glob's ignore API, but that is not implemented yet. ;)

That will help a lot! :)

Seeing how Isaac don't want to support multiple patterns natively it seems that this module will have to stay.

@UltCombo
Copy link
Contributor Author

UltCombo commented Jan 1, 2015

Oh I'm also running into the same perf diff issue, looks like the first bench is up to 10% slower for some reason.

If I add another benchmark test before the actual benchmarks then it evens out the difference a bit:

                     negative globs (100 negated paths)
            227 op/s » warm up (ignore)
            240 op/s » globby async (working directory)
            244 op/s » globby async (upstream/master)

@UltCombo
Copy link
Contributor Author

UltCombo commented Jan 2, 2015

I fiddled a bit with benchmark.js, and according to it, upstream/master is 10% slower hahah.
This benchmarking stuff is too much headache.

@sindresorhus
Copy link
Owner

@UltCombo hmm, maybe it has something to do with the filesystem/harddrive cache.

@UltCombo
Copy link
Contributor Author

UltCombo commented Jan 2, 2015

That's the most likely yeah, seeing as filesystem access is the biggest bottleneck here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple positive globs handling is slow Negative globs handling is very slow
2 participants