Repeated `noseyparker scan` invocations produce different results #32

bradlarsen · 2023-02-22T13:34:26Z

          It seems that it is related to the scan :

I will create an issue for this later 😄

Originally posted by @Coruscant11 in #29 (comment)

The text was updated successfully, but these errors were encountered:

bradlarsen · 2023-02-22T13:47:02Z

Okay, what is shown in the above screenshot is that running a fresh scan multiple times on an unchanging input produces different counts of the number of matches.

I was able to reproduce this on a regular clone of cpython:

$ git clone https://github.com/python/cpython
$ cd python
$ rm -rf np.test && noseyparker scan -d np.test .
# ... note the summary numbers
$ rm -rf np.test && noseyparker scan -d np.test .
# ... note the summary numbers again: different than before!

I think what's going on is this:

The np.test datastore directory is created within the path to be scanned (.)
The inputs paths are enumerated, and the datastore sqlite3 database gets included in the list of files to scan
There are many other plain files (not Git history) to scan that have findings that get reported and recorded to the sqlite3 database
By the time a scanner thread scans the sqlite3 database, there are some matches from those other files already recorded. These matches get detected and reported from the sqlite3 database.

How to fix this: Nosey Parker scan already has an ignore mechanism to control which paths are excluded from scanning. The default set of ignore rules only ignores paths for certain Git internal files, which will be scanned using the Git history enumeration mechanism. This set of default rules could be dynamically updated to include the canonical path to the active datastore.

bradlarsen · 2023-02-22T23:05:13Z

@Coruscant11 I just pushed changes that should fix this surprising behavior you were seeing. Please reopen if the problem is not fixed for you.

Coruscant11 · 2023-02-27T09:59:18Z

Hello, it works perfectly well and I can also see a huge false positive decrease.
Thank you for your work!

bradlarsen mentioned this issue Feb 22, 2023

Add commits range option for scan in Git repositories #29

Open

bradlarsen added the bug Something isn't working label Feb 22, 2023

bradlarsen self-assigned this Feb 22, 2023

bradlarsen closed this as completed in 25088e8 Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated `noseyparker scan` invocations produce different results #32

Repeated `noseyparker scan` invocations produce different results #32

bradlarsen commented Feb 22, 2023

bradlarsen commented Feb 22, 2023 •

edited

Loading

bradlarsen commented Feb 22, 2023

Coruscant11 commented Feb 27, 2023

Repeated noseyparker scan invocations produce different results #32

Repeated noseyparker scan invocations produce different results #32

Comments

bradlarsen commented Feb 22, 2023

bradlarsen commented Feb 22, 2023 • edited Loading

bradlarsen commented Feb 22, 2023

Coruscant11 commented Feb 27, 2023

Repeated `noseyparker scan` invocations produce different results #32

Repeated `noseyparker scan` invocations produce different results #32

bradlarsen commented Feb 22, 2023 •

edited

Loading