-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented RecursiveFilterIterator to improve exclude file performance #620
Implemented RecursiveFilterIterator to improve exclude file performance #620
Conversation
Note that the tests pass, but the test suite doesn't pass on master. |
I really don't want to refactor how the directory is scanned in the 2.x version because it has been stable for too long. You might want to look at the 3.x branch, where I've already done some refactoring around how directories are searched. For 2.x, it would be best if you could just tell me which bit of your code made the most difference, or re-submit a PR with the smallest changeset possible to achieve the performance increase. On a more general note, if you have 1800 exclude patterns, then you are clearly in need of a completely different feature to help you manage things. If you give me more of an idea of what you are doing, maybe we can design a feature together that makes life easier for you. Also, the reason the build is failing is because you have coding standard errors in the PR. It's not a big deal - I would need to add comments in your new class if I merged this PR anyway. |
Sure thing, ill look into the 3.0 dev branch. So let me explain the use case here. We have a codebase that currently uses a subset of the PSR1/2 standard, with a bunch of exclusions and custom sniffs, snake case instead of camel case for example. We're talking about moving to a full PSR2 standard without the exclusions. Obviously, things would blow up if we just switched, so we want a way to run one ruleset across the codebase as it currently stands, and run the new ruleset against new files only. In order to do that, we need to produce an index of all the current files, and use that for one ruleset, and exclude it from the other ruleset. Currently, the only way to do this is to include those files in a ruleset using The optimizations to the The other issue with this approach, is that you can't process the entire codebase in one pass, you have to run phpcs with 2 different commands, one using the legacy ruleset, and one using the new ruleset. Now if we're talking about adding features, for 3.0, a really useful feature would be to be able to have 2 separate standards, and be able to run each over a different set of files. An alternative approach would be to allow the file to contain a dockblock directive that would inform phpcs which standard/ruleset to use. Thoughts? |
Additionally, the new class, optimizes the filter iteration performance, if a file is to be excluded, then the iterator never iterates over it, additionally, if a directory is to be excluded, ALL subdirectories are then skipped, the current implementation will iterate over all subdirectories and perform a regex match check over them, for every file in that subdirectory, which is pointless if the directory is being ignored. |
Thanks for describing your use case. It makes sense why you've optimised things the way you have, but it is only optimising for a very specific case and I'm sure we can figure out a better way of doing things for you. One obvious suggestion would be to tell PHPCS which files to process instead of which files to exclude. If you're already generating an excluded file list, outputting the opposite seems like it might be a simply thing to do. Writing a wrapper around PHPCS is another option, so you can inject the file list (presumably stored next to the rulesets) instead of including the files in a ruleset. Ultimately, you're probably going to want some sort of Filter class that you can write and supply to PHPCS. I've described that concept briefly in another conversation about result caching in 3.0 here: #530 The idea being that once PHPCS has a list of files to process, a series of built-in filters can be applied (things like regex exclude patterns, modification date before/after x, git status) and/or custom filters can be applied for very custom filtering rules. If this sounds like it might be something that could work for you, let's use this issue to figure something out and we can get some code going. The caching stuff on that other issue is pretty much done, so that's a lot of refactoring that's already been done. |
The only reason I'd shy away rom producing a file list is that once we switch standards, the excludes will stay fixed, all new files will then conform to the new standard. I don't want to have to have scripts that we have to run every time a new file is added, that just won't work for our org. We basically want an include list for the old rule set and an exclude list for the old rule set. The current implementation is very wasteful when doing non-pattern based matches. Additionally, the current method will test all files in a subdirectory against a pattern match, even if the directory is excluded. The filter ensures that files in a directory that is excluded aren't event processed. Much more efficient. |
Any update on this? |
Finding you a good solution is still on my todo, but I don't have time at the moment. |
@gsherwood Any progress on this? |
None. Still on my todo list though. |
Related commit: 4315292 |
@gsherwood looking sharp, the git diff one is good, we currently use this via a custom script and comments back on github! The reason we wanted this all in the first place, is to apply different sniffs to different files. For example, we want to apply one siff to new files, and a different one for old files, so we made an index of all current files, and exclude those from the new sniff, but can't do the inverse. |
@oligriffiths Sorry, I didn't have time to explain the change yesterday and left that comment as a note for myself. I am building that filtering feature with the ability to have your own custom filter classes, like you can with reports. One of the options we discussed in this PR is for you to have a custom filter so you can do whatever you need to do. For your specific case, you'll probably not make use of any PHPCS ignore patterns and instead use a filter (one for each of your standards) where you can do exact match checking on the file paths, or create something completely custom that doesn't even extend the PHPCS base Filter class. That allows PHPCS filtering to still be pretty basic but for you to do something more efficient for your specific case. But I haven't done any dev on custom filters yet. Shouldn't take too long though. (Edit: it's done now) |
Another related commit: f90e08c |
Awesome. Sounds good. I like the direction.
|
Closing this in favour of custom filter classes in 3.0. 3.0.0a1 is now released, so it can be tested. |
The current exclude mechanism is very slow for a large number of excludes.
For each file to test, the entire excludes array is iterated over and a regex is performed.
For our codebase, this takes over a minute for a single test (we have 1800 excludes).
The reason we have so many excludes is we want to switch from one coding standard to another, so we include all the current files in one standard and exclude in another.
This implementation uses a RecursiveFilterIterator and an improved exclude feature to speed up the process. It now takes just over 1 second instead of 1 minute to run out test suite.