Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(vcs): overflow error when repo contains large number of files #1165

Merged
merged 3 commits into from
Sep 9, 2019

Conversation

edvald
Copy link
Collaborator

@edvald edvald commented Sep 6, 2019

What this PR does / why we need it:

We now run fewer git commands to list files in a directory, and avoid
having to list every file before later filtering out ignored files.
We also use streaming to avoid any buffering issues, and for improved
efficiency.

Which issue(s) this PR fixes:

Fixes #1162

@edvald edvald requested a review from eysi09 September 6, 2019 19:46
@eysi09
Copy link
Collaborator

eysi09 commented Sep 9, 2019

Does the change anything for #1097?

@edvald
Copy link
Collaborator Author

edvald commented Sep 9, 2019

No, this doesn't address that issue.

Copy link
Collaborator

@eysi09 eysi09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a few comments and questions but overall looks good and feels way more robust than our previous implementation. Good job!


// We push to the output array when all ls-files commands "agree" that it should be included,
// and it passes through the include/exclude filters.
if (paths[resolvedPath] === this.ignoreFiles.length && matchPath(filePath, include, exclude)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand this: paths[resolvedPath] === this.ignoreFiles.length? (It's early Monday morning, so I'm a little slow.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVM, I get it. The control flow is a little hard to follow but I'm not sure if there's a way to make it simpler without sacrificing perf.

}
}

// We run ls-files for each ignore file and collect each return result line with `handleLine`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add to the comment that this is the function that actually triggers the population of the output array? It's a little confusing because we're mapping over this.ignoredFiles to get the actual files (since that's how the git command works).

// if we get 128 we're not in a repo root, so we get no files
if (err.exitCode !== 128) {
throw err
// List modified files, so that we can ensure we have the right hash for them later
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to document the sequencing above the function since the control flow is a little hard to follow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, we should at least add a comment saying that this function returns files filtered against ignore files, includes and excludes, since the name is pretty generic.

// We run ls-files for each ignoreFile and do a manual set-intersection (by counting elements in an object)
// in order to optimize the flow.
const paths: { [path: string]: number } = {}
const output: VcsFile[] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to files or similar?

We now run fewer git commands to list files in a directory, and avoid
having to list every file before later filtering out ignored files.
We also use streaming to avoid any buffering issues, and for improved
efficiency.

Fixes #1162
@edvald edvald force-pushed the fix-large-getfiles branch from 5d71306 to 4f8a8a8 Compare September 9, 2019 12:59
@edvald edvald merged commit 4f5fabc into master Sep 9, 2019
@edvald edvald deleted the fix-large-getfiles branch September 9, 2019 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GitHandler.getFiles() crashes on very long file list
2 participants