Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scanning a project with many DLLs is slow #3455

Open
KeylinxTobias opened this issue Nov 18, 2024 · 7 comments
Open

Scanning a project with many DLLs is slow #3455

KeylinxTobias opened this issue Nov 18, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@KeylinxTobias
Copy link

We are having our build environment disconnected from Internet for security reasons and want to be able to produce an non enriched SBOM using Syft. However when running Syft, the scanning is very slow. Using the verbose logging and we can track that it is some dll's that take a lot of time. We have checked the configuration options and tried to disable as much as possible but it still does not go well. It seems that when scanning each file, Syft tries to go out on Internet and check for something and then times out? A common solution for us can sometimes take about an hour to produce the SBOM. When using other scanners like Trivy, we are down to seconds.

We are mainly using nuget packages in our solution and are using windows servers for our build environment.

How to reproduce:

  1. Download syft.exe and files, copy it to an offline machine
  2. Run a scan on a dotnet solution

Thanks in advance.

@KeylinxTobias KeylinxTobias added the bug Something isn't working label Nov 18, 2024
@wagoodman
Copy link
Contributor

An initial glance shows that what's taking time is the dotnet-portable-executable-cataloger. The dotnet-deps-cataloger is clocking in rather fast (only seconds for large projects such as dotnet/sdk and dotnet/roslyn. On the other hand, taking a look at specifically the SDK project after running ./build.sh we get a lot of DLLs:

find . | grep dll | wc -l                       
   36335

Now the time to read that many DLLs make sense, however, looking at the raw syft results I think a few things need to be verified:

❯ time syft .     
 ✔ Indexed file system                                                                                                                                                                                    .
 ✔ Cataloged contents                                                                                                                      cdb4ee2aea69cc6a83331bbe96dc2caa9a299d21329efb0336fc02a82e1839a8
   ├── ✔ Packages                        [46,897 packages]  
   ├── ✔ File digests                    [36,540 files]  
   ├── ✔ File metadata                   [36,540 locations]  
   └── ✔ Executables                     [36,689 executables]  
...
System.Diagnostics.DiagnosticSource                                            9.0.24.52809                                                       dotnet                  (+1 duplicate)      
System.Diagnostics.EventLog                                                    10.0.0-alpha.1.24531.8                                             dotnet                  (+2 duplicates)     
System.Diagnostics.EventLog                                                    10.0.0-alpha.1.24565.3                                             dotnet                  (+102 duplicates)   
System.Diagnostics.EventLog                                                    10.0.24.53005                                                      dotnet                  (+3 duplicates)     
System.Diagnostics.EventLog                                                    10.0.24.53108                                                      dotnet                  (+7 duplicates)     
System.Diagnostics.EventLog                                                    10.0.24.55105                                                      dotnet                  (+3 duplicates)     
System.Diagnostics.EventLog                                                    10.0.24.56503                                                      dotnet                  (+145 duplicates)   
System.Diagnostics.EventLog                                                    7.0.0                                                              dotnet                  (+30 duplicates)    
System.Diagnostics.EventLog                                                    8.0.0                                                              dotnet                  (+5 duplicates)     
...
syft .  173.88s user 25.37s system 119% cpu 2:46.38 total

I see a lot of packages (46,897!) and a lot of them have multiple duplicates (some in the hundreds!). We try to keep distinct project dependency graphs, so it might be possible that this is correct, but I think it would be worth double checking this result too.

Back to the topic at hand: performance... I don't see how we can scan that many DLLs in only seconds, so that doesn't seem like the right answer here. The deps.json cataloger also has a lot to work with one a build has been performed:

$ find . | grep deps.json | wc -l
     354

And when we ignore DLLs (thus look purely at the deps.json) we see much better performance:

❯ time syft . --exclude '**/*.dll'                                    
 ✔ Indexed file system                                                                                                                                                                                    .
 ✔ Cataloged contents                                                                                                                      cdb4ee2aea69cc6a83331bbe96dc2caa9a299d21329efb0336fc02a82e1839a8
   ├── ✔ Packages                        [10,787 packages]  
   ├── ✔ File digests                    [430 files]  
   ├── ✔ File metadata                   [430 locations]  
   └── ✔ Executables                     [497 executables]  
[0000]  WARN no explicit name and version provided for directory source, deriving artifact ID from the given path (which is not ideal)
NAME                                                                           VERSION                     TYPE                                      
.                                                                              10.0.100-dev                dotnet                  (+7 duplicates)    
.NET Host                                                                      6.0.4                       dotnet                                     
Argon                                                                          0.17.0                      dotnet                  (+1 duplicate)     
ArgumentForwarding.Tests                                                       10.0.100-dev                dotnet                                     
ArgumentsReflector                                                             10.0.100-dev                dotnet                  (+2 duplicates)    
Castle.Core                                                                    5.1.1                       dotnet                  (+16 duplicates)   
ConsoleDemoWithCasing                                                          1.0.0                       dotnet                  (+2 duplicates)    
DiffEngine                                                                     15.4.2                      dotnet                  (+1 duplicate)     
DiffPlex                                                                       1.5.0                       dotnet                                     
DiffPlex                                                                       1.7.2                       dotnet                  (+1 duplicate)     
DotNetWatchTasks                                                               10.0.100-dev                dotnet                  (+1 duplicate)     
DumpMinitool                                                                   17.1300.24.52301            dotnet                                     
DumpMinitool                                                                   17.1300.24.56301            dotnet                  (+4 duplicates)    
...
xunit.extensibility.core                                                       2.9.2                       dotnet                  (+50 duplicates)   
xunit.extensibility.execution                                                  2.9.2                       dotnet                  (+50 duplicates)   
xunit.runner.console                                                           2.9.2                       dotnet                  (+46 duplicates)   
xunit.runner.reporters                                                         2.9.2                       dotnet                  (+46 duplicates)   
xunit.runner.utility                                                           2.9.2                       dotnet                  (+46 duplicates)   
xunit.runner.visualstudio                                                      2.8.2                       dotnet                  (+46 duplicates)
A newer version of syft is available for download: 1.15.0 (installed version is 1.14.2)
syft . --exclude '**/*.dll'  7.50s user 3.08s system 99% cpu 10.665 total

As a workaround @KeylinxTobias can you try out running syft with --exclude '**/*.dll' and report back if it is both performant and accurate for you?

Side note: I wonder if we should have a *.csproj cataloger? I'm not a dotnet developer, so this might be a bad suggestion for a syft enhancement.

@kzantow
Copy link
Contributor

kzantow commented Nov 18, 2024

There is a request for a csproj cataloger: #1522

@TimBrown1611
Copy link

is this issue happens on windows OS?

@popey
Copy link
Contributor

popey commented Nov 25, 2024

@TimBrown1611 I don't believe this is a windows specific issue, no. GIven @wagoodman ran ./build.sh, and as a fine upstanding gentleman, I presume he's running Linux (or macos). The confusion often comes when talking about DLL files, as DotNet on non-Windows operating system also uses DLLs for shared libraries.

@tomersein
Copy link
Contributor

hi @popey !
I think I have a similar issue with this cataloger, I wonder if you can explain the workaround so I can see if it works?
you say we might use exclusions on dll files and the packages will appear in the SBOM from other catalogers?
thanks!

@kzantow
Copy link
Contributor

kzantow commented Dec 11, 2024

For discussion:

This sounds like a user is scanning a source directory with build artifacts and we don't want to include the build artifacts in the scan since we already have the build description files providing the information about what packages we should report.

This is similar to a Maven project, where you have a pom.xml at the top level but also a target directory, where after a build is run, it is populated with some build artifacts that are not what the user is interested in reporting.

This could also be related to how we use different catalogers if we're scanning image sources vs directories -- this may be somehow per-directory tree differences.

@rsphilk
Copy link

rsphilk commented Jan 13, 2025

I have observed this behavior also on an offline build server. It could be traced back to repeated attempts to download certificate revocation lists for code signing certificates. This of course happens only when there are already signed binaries in the scanned stuff.
It was not possible to control that behavior or make syft use a proxy for this step via the environment variables. So the only solution was to enable internet access via proxy.

@wagoodman wagoodman changed the title Syft scan in offline mode is slow Scanning a project with many DLLs is slow Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

7 participants