Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated noseyparker scan invocations produce different results #32

Closed
bradlarsen opened this issue Feb 22, 2023 · 3 comments
Closed

Repeated noseyparker scan invocations produce different results #32

bradlarsen opened this issue Feb 22, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@bradlarsen
Copy link
Collaborator

          It seems that it is related to the scan :

image

I will create an issue for this later 😄

Originally posted by @Coruscant11 in #29 (comment)

@bradlarsen bradlarsen added the bug Something isn't working label Feb 22, 2023
@bradlarsen bradlarsen self-assigned this Feb 22, 2023
@bradlarsen
Copy link
Collaborator Author

bradlarsen commented Feb 22, 2023

Okay, what is shown in the above screenshot is that running a fresh scan multiple times on an unchanging input produces different counts of the number of matches.

I was able to reproduce this on a regular clone of cpython:

$ git clone https://github.com/python/cpython
$ cd python
$ rm -rf np.test && noseyparker scan -d np.test .
# ... note the summary numbers
$ rm -rf np.test && noseyparker scan -d np.test .
# ... note the summary numbers again: different than before!

I think what's going on is this:

  • The np.test datastore directory is created within the path to be scanned (.)
  • The inputs paths are enumerated, and the datastore sqlite3 database gets included in the list of files to scan
  • There are many other plain files (not Git history) to scan that have findings that get reported and recorded to the sqlite3 database
  • By the time a scanner thread scans the sqlite3 database, there are some matches from those other files already recorded. These matches get detected and reported from the sqlite3 database.

How to fix this: Nosey Parker scan already has an ignore mechanism to control which paths are excluded from scanning. The default set of ignore rules only ignores paths for certain Git internal files, which will be scanned using the Git history enumeration mechanism. This set of default rules could be dynamically updated to include the canonical path to the active datastore.

@bradlarsen
Copy link
Collaborator Author

@Coruscant11 I just pushed changes that should fix this surprising behavior you were seeing. Please reopen if the problem is not fixed for you.

@Coruscant11
Copy link
Contributor

Hello, it works perfectly well and I can also see a huge false positive decrease.
Thank you for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants