Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syft generates invalid PURLs when name contains : #3577

Open
jkugler opened this issue Jan 9, 2025 · 7 comments
Open

Syft generates invalid PURLs when name contains : #3577

jkugler opened this issue Jan 9, 2025 · 7 comments
Labels
bug Something isn't working

Comments

@jkugler
Copy link

jkugler commented Jan 9, 2025

What happened:
Syft found a dotnet executable during a scan, and pulled information from it, including the name parameter. In this file, the name it found was TODO: <Product name>.

This generated a PURL like this:

pkg:nuget/TODO:%20<Product%20name>@1.0.0.1

As an aside, the bom-ref and cpe were:

"bom-ref": "pkg:nuget/TODO:%20<Product%20name>@1.0.0.1?package-id=8bba3601e9b8e7ac"
"cpe": "cpe:2.3:a:TODO\\:_\\<Product_name\\>:TODO\\:_\\<Product_name\\>:1.0.0.1:*:*:*:*:*:*:*"

This causes CycloneDX libraries such as https://github.com/CycloneDX/cyclonedx-python-lib to raise an exception, because this is an invalid PURL. Example:

>>> from packageurl import PackageURL
>>> PackageURL.from_string("pkg:nuget/TODO:%20<Product%20name>@1.0.0.1")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    PackageURL.from_string("pkg:nuget/TODO:%20<Product%20name>@1.0.0.1")
  File "/Users/tek30584/programming/merge_poc/.venv/lib/python3.12/site-packages/packageurl/__init__.py", line 508, in from_string
    raise ValueError(msg)
ValueError: Invalid purl 'pkg:nuget/TODO:%20<Product%20name>@1.0.0.1' cannot contain a "user:pass@host:port" URL Authority component: ''.

# NOTE: If I remove the `:`, it parses correctly:
>>> PackageURL.from_string("pkg:nuget/TODO%20<Product%20name>@1.0.0.1")
PackageURL(type='nuget', namespace=None, name='TODO <Product name>', version='1.0.0.1', qualifiers={}, subpath=None)

What you expected to happen:
I expected Syft to generate a valid PURL by escaping all invalid characters.

Steps to reproduce the issue:
Scan a dotnet file with a product name value of "TODO: "

I am trying to get permission to attach the file to this ticket. I will attach if/when I get permission.

Anything else we need to know?:

Environment:

  • Output of syft version:
Application: syft
Version:    1.18.1
BuildDate:  2024-12-13T18:41:10Z
GitCommit:  5e16e5031a13f8a11057feb8544decebfc43b4ed
GitDescription: v1.18.1
Platform:   darwin/arm64
GoVersion:  go1.23.4
Compiler:   gc
  • OS (e.g: cat /etc/os-release or similar):
$ sw_vers
ProductName:		macOS
ProductVersion:		14.7.1
BuildVersion:		23H222
@jkugler jkugler added the bug Something isn't working label Jan 9, 2025
@jkugler
Copy link
Author

jkugler commented Jan 10, 2025

Ah, seems this might actually be a bug in https://github.com/anchore/packageurl-go

@jkugler
Copy link
Author

jkugler commented Jan 11, 2025

Oddly enough, anchore/packageurl-go seems to behave properly:

package main

import (
	"fmt"

	"github.com/package-url/packageurl-go"
)

func main() {
	instance := packageurl.NewPackageURL("test", "ok", "TODO: <Product name>", "version", nil, "")
	fmt.Printf("%s", instance.ToString())
}

gives me

pkg:test/ok/TODO%3A%20%3CProduct%20name%3E@version

Now I'm really confused as to why the : is making it in to the PURL in the SBOM.

@jkugler
Copy link
Author

jkugler commented Jan 11, 2025

BTW, I tried using anchore/packageurl-go in Go Playground, but got this error:

go: finding module for package github.com/anchore/packageurl-go
go: downloading github.com/anchore/packageurl-go v0.1.0
go: found github.com/anchore/packageurl-go in github.com/anchore/packageurl-go v0.1.0
go: play imports
	github.com/anchore/packageurl-go: github.com/anchore/packageurl-go@v0.1.0: parsing go.mod:
	module declares its path as: github.com/package-url/packageurl-go
	        but was required as: github.com/anchore/packageurl-go

@jkugler
Copy link
Author

jkugler commented Jan 13, 2025

This is verified as a bug in anchore/packageurl-go. When I depend on anchore/packageurl-go, I get this:

pkg:test/ok/TODO:%20<Product%20name>@version

which is of course wrong

But when I depend on github.com/package-url/packageurl-go I get:

pkg:test/ok/TODO%3A%20%3CProduct%20name%3E@version

which is correct.

I'll see what I can do to bring the fix into anchore/packageurl-go

@jkugler
Copy link
Author

jkugler commented Jan 13, 2025

@wagoodman it seems you touched this last: https://github.com/anchore/packageurl-go/blob/master/packageurl.go#L515

Your comment here:

// - the ':' scheme and type separator does not need to and must NOT be encoded. It is unambiguous unencoded everywhere

While it's true that the : is needed for the scheme separator, it is invalid if it shows up anywhere else. So, if a name contains : it will generate an invalid PURL. From my understanding of the PURL spec, and my understanding of the code in https://github.com/anchore/packageurl-go/blob/master/packageurl.go, we never escape the entire string at once, (i.e., we never escape a string such as pkg:/test/......) and only escape each element at a time.

I see the change to not escape : was introduced in anchore/packageurl-go@f4d2668#diff-43bc5bec26c575301328864f03daa71d047280defe6add7b901cfb259b9726c5 by way of a merge from master...but I'm not quite sure where that merge came from.

Any objection to reverting the escape function to https://github.com/anchore/packageurl-go/blob/master/packageurl.go#L515 or at least adding : and <> to the list of things escaped?

Thanks!

@jkugler
Copy link
Author

jkugler commented Jan 14, 2025

Given https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#how-to-build-purl-string-from-its-components I would say everything in each segment is escaped, thus we should not be skipping :, <>, etc.

@jkugler
Copy link
Author

jkugler commented Jan 14, 2025

I'm not 100% sure this is correct:

		{
			name:     "characters are unencoded where allowed",
			input:    "pkg:type/%3E%41%22space/name@version?key=value!#sub/path",
			expected: "pkg:type/>A\"space/name@version?key=value!#sub/path",
		},

https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#rules-for-each-purl-component
says "Each namespace must be a percent-encoded string.
If that's the case, then shouldn't > and " be percent encoded?

Also, for "name:" "A name must be a percent-encoded string." I Would then expect that ! should also be encoded. Thus, this would be the expected string:

pkg:type/%3EA%22space/name@version?key=value%21#sub/path

Would you agree?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

1 participant