Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New: Added ignore option (fixes #115) #151

Merged
merged 1 commit into from
Feb 18, 2015

Conversation

vegetableman
Copy link
Contributor

fixes #115

In addition to minimatch check, the code checks if a particular pattern ends with a glob star **, if it does, adds the directory to the ignore list and stops traversal for the directory matching the pattern.

Also, if, for example, a/** is added to the ignore list, it removes a from the resulting match list.

@UltCombo
Copy link

UltCombo commented Jan 2, 2015

Odd how the only failing test is due to 2 paths out of order in Node 0.11 only.

Nice work, I'll start experimenting with this tomorrow hopefully.

//cc @contra @sindresorhus

@vegetableman
Copy link
Contributor Author

@UltCombo had those failed cases randomly thrown on my local as well but they may be due to race conditions mentioned in the README?. Travis is clear now.

@vegetableman
Copy link
Contributor Author

@UltCombo did u get a chance to test it?

@UltCombo
Copy link

UltCombo commented Jan 8, 2015

Sorry, hadn't had time to test it throughout. I just pushed a new branch to globby using node-glob's ignore option in the async API.

Ignoring some files in a directory through the ignore option is about two times slower than globby, looks like it is because you're calling the minimatch function inside a loop which parses the pattern again in each iteration. Ideally, only a single Minimatch class instance should be created for each ignore pattern. See how globby handles it currently.

Ignoring a whole dir (dir/**) is throwing a RangeError: Maximum call stack size exceeded, this might be a problem with the benchmark though.

@yocontra
Copy link

yocontra commented Jan 8, 2015

@UltCombo thanks for taking the time to benchmark it 👍

So I put $100 on bountysource for this but it was on the other issue #, not sure how to move it over here but I'll contact them about that.

@vegetableman vegetableman force-pushed the fix-ignore-branch branch 6 times, most recently from 13bf508 to 9f2958a Compare January 10, 2015 18:28
@vegetableman
Copy link
Contributor Author

@UltCombo Thanks for the benchmarks.

you're calling the minimatch function inside a loop which parses the pattern again in each iteration.Ideally, only a single Minimatch class instance should be created for each ignore pattern. See how globby handles it currently.

Creating only one instance now.

Ignoring a whole dir (dir/**) is throwing a RangeError: Maximum call stack size exceeded,

Using setImmediate to return the callback has fixed the issue. Ref:- logicalparadox/matcha#9

Although there may be minor diffs, but following are the results from the existing globby benchmarks 👍 :

matcha bench

                      negative globs (some files inside dir)
             229 op/s » globby async (working directory)
             223 op/s » globby async (upstream/master)
             241 op/s » globby sync (working directory)
             226 op/s » globby sync (upstream/master)
              59 op/s » glob-stream

                      negative globs (whole dir)
          10,075 op/s » globby async (working directory)
             368 op/s » globby async (upstream/master)
             236 op/s » globby sync (working directory)
             235 op/s » globby sync (upstream/master)
              66 op/s » glob-stream

                      multiple positive globs
              86 op/s » globby async (working directory)
              89 op/s » globby async (upstream/master)
              77 op/s » globby sync (working directory)
              74 op/s » globby sync (upstream/master)
              36 op/s » glob-stream


  Suites:  3
  Benches: 15
  Elapsed: 26,361.82 ms

@UltCombo
Copy link

Nice!
Btw, so far I've only updated the async API to use the ignore option, so you can ignore the sync results for now.

@vegetableman
Copy link
Contributor Author

Updated benchmarks on fixing globby sync:-

matcha bench

                      negative globs (some files inside dir)
             223 op/s » globby async (working directory)
             222 op/s » globby async (upstream/master)
             432 op/s » globby sync (working directory)
             254 op/s » globby sync (upstream/master)
              68 op/s » glob-stream

                      negative globs (whole dir)
          11,760 op/s » globby async (working directory)
             409 op/s » globby async (upstream/master)
          18,977 op/s » globby sync (working directory)
             244 op/s » globby sync (upstream/master)
              73 op/s » glob-stream

                      multiple positive globs
              98 op/s » globby async (working directory)
              96 op/s » globby async (upstream/master)
             516 op/s » globby sync (working directory)
              81 op/s » globby sync (upstream/master)
              37 op/s » glob-stream


  Suites:  3
  Benches: 15
  Elapsed: 23,045.84 ms

@UltCombo
Copy link

Oh, I was about to start working on that, thank you very much!
Mind sending a PR to globby's glob-ignore branch later?

@vegetableman
Copy link
Contributor Author

Sure. made a quick fix for now. updating it.

@UltCombo
Copy link

Thanks, no hurry. Sending a quick fix plus the Matcha fix is okay, @sindresorhus and I can clean it up afterwards. =]

@vegetableman
Copy link
Contributor Author

Have sent fix for sync. Matcha fix? Do you mean the setImmediate fix? It's part of node-glob.

@@ -74,6 +75,27 @@ function setopts (self, pattern, options) {
self.statCache = options.statCache || Object.create(null)
self.symlinks = options.symlinks || Object.create(null)

self.ignore = options.ignore || []

if(!Array.isArray(self.ignore)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add space to match the style in the rest of the file.

@isaacs
Copy link
Owner

isaacs commented Jan 12, 2015

Sorry, I was unclear. I'm not saying that the optimization is not a requirement for release of this feature.

I'm saying it's premature in advance of correct behavior.

The first step should be to implement the contract correctly, and provide a mountain of tests for every conceivable edge case. Then add optimizations, each time also adding more tests to guarantee that the optimization's new edge cases don't break the contracts.

I'll review again tomorrow and see if I can poke any new holes in it ;)

@isaacs
Copy link
Owner

isaacs commented Jan 12, 2015

@vegetableman Thanks for your patience on this, btw. I know it's frustrating when you make a contribution and the author puts you through the ringer, but since this module is used by many high-profile fs libs, it's important to be really careful about anything that can introduce surprising edge cases. In the worst case, they end up depending on those bugs, and then when we fix them, it causes problems.

@vegetableman
Copy link
Contributor Author

Thanks @isaacs for your patience as well. You are awesome 👍 .

@vegetableman
Copy link
Contributor Author

Added more tests including tests for negation and brace-expansion to accelerate this PR.

@isaacs
Copy link
Owner

isaacs commented Jan 14, 2015

I don't think "negative ignores" is a good idea. Maybe we should just throw in that case. If you want something to not be ignored, then put it in the pattern. It opens the door for folks to do crazy stuff like glob('**', {ignore:['!thing-i-want', '!other-thing-i-want']}) so now we're into double-negative territory. If that's a worthwhile use-case, then let's just add multi-glob support with some sensible API surface.

Added one minor comment, will review in more detail later.

@vegetableman
Copy link
Contributor Author

Agreed. It's kind of awkward and the current functionality simply reverses an ignore and excludes other non-matching files. May be we could go the globby way of supporting multi-globs. Something like

glob(['a/b', 'b/c'])

It's worth noting that globby supports an interesting way of handling multi-globs and corresponding ignore list. Refer:- sindresorhus/globby#6.
Also, pushed the replace fix.

@UltCombo
Copy link

If that's a worthwhile use-case, then let's just add multi-glob support with some sensible API surface.

node-glob's lack of multi-glob support is the sole reason globby exists.

Also worth noting that glob-stream implements the very same logic as globby as well (albeit it consumes node-glob's EventEmitter API instead of the callback/sync ones). Multi-glob support + a streaming API (#70) should make glob-stream obsolete as well.

@yocontra
Copy link

multi-glob support and a streaming API would completely solve our issues and perf bottlenecks in gulp and I'm sure in a lot of other projects

@isaacs
Copy link
Owner

isaacs commented Jan 14, 2015

Let's discuss multi-glob and streaming in other issues, plz.

@isaacs
Copy link
Owner

isaacs commented Jan 14, 2015

@vegetableman Getting closer, but logic's still not quite right.

> g.sync('test/a/c/**')
[ 'test/a/c',
  'test/a/c/d',
  'test/a/c/d/c',
  'test/a/c/d/c/b' ]
> g.sync('test/a/c/*')
[ 'test/a/c/d' ]
> g.sync('test/a/c/**', {ignore:'test/a/c/*'})
[]

In this case, it should only ignore patterns that would be matched by test/a/c/*, that is, test/a/c/d.

However, as you can see, it's failing to even explore the child directories in test/a/c, because of the star handling logic. The expected value here would be:

> g.sync('test/a/c/**').filter(function(p) { return p !== 'test/a/c/d' })
[ 'test/a/c',
  'test/a/c/d/c',
  'test/a/c/d/c/b' ]

@vegetableman
Copy link
Contributor Author

Thanks @isaacs for finding the bug. Great catch 👍 .
Pushed fixes with test cases. Have removed the single star check causing the problem. Also, missed catching it before, since, I was passing absolute paths ( which was enabled on passing the cwd option), now only relative paths are passed.
Also, an exception is thrown now in case of negation in an ignore.

@vegetableman vegetableman force-pushed the fix-ignore-branch branch 4 times, most recently from e4fcfc0 to 19fa2e9 Compare January 22, 2015 06:24
@vegetableman
Copy link
Contributor Author

@isaacs The star handing logic has been simplified with support only for globstar, that is, ignore traversing if globstar, and minimatch check for the rest. I think this could be merged.

@isaacs isaacs merged commit 19fa2e9 into isaacs:master Feb 18, 2015
@isaacs
Copy link
Owner

isaacs commented Feb 18, 2015

Ok! Landed this on master.

@vegetableman I had a few other suggestions, which were mostly pretty minor, and I went ahead and just made the changes myself. Since we've gone back and forth a bunch on this, and you seemed to be getting something out of the process, I want to share the reasoning behind both:

  1. The two tests were merged into a single test that just lists cases and has less repeated code. This makes it a lot easier to add new tests, and reduces the likelihood that one day we'll add a new ignore test, and forget to update the sync test as well, etc.
  2. I split up the isIgnored function into isIgnored and childrenIgnored, in order to remove the boolean argument. In general, a boolean argument is often an indication that you probably are putting two different functions in the same place, and it'd be better to just have them go ahead and be two separate functions.
  3. I de-inlined the argument to ignore.map(), since it doesn't actually need access to the closure state.

Will release with a bumped minor later today. Might try to get a few other PRs in there as well.

@vegetableman
Copy link
Contributor Author

Thanks for the refactor @isaacs . Makes sense and is much more comprehensible.

@yocontra
Copy link

Oooooooohhhhhhh!

Waiting on a PR for glob-stream now to incorporate this. Thanks @isaacs

🎈

@vegetableman
Copy link
Contributor Author

@contra Have sent a bounty request. Waiting for you to accept it. 💸

@yocontra
Copy link

@vegetableman Sent - thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

consider to support options.ignore
4 participants