Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create utility for package version handling: parsing, normalization, comparison #119

Closed
pombredanne opened this issue Oct 8, 2019 · 22 comments

Comments

@pombredanne
Copy link
Collaborator

pombredanne commented Oct 8, 2019

There are many different versions styles and ways to express version ranges we need to deal with:

  • semver, deb, rpm, pypi, ... e.g. in general system and application package managers may use different version conventions
  • versions ranges and specs are used when referencing dependencies in package manifests but also may be used in advisory and vulnerabilities (cvss, npm advisories, etc.)

We need to craft a utility module to deal with these consistently. For starters in #101 npm avdisory use their own syntax to define these.
We may also need to deal with stored version ranges later.

This is based on this chat:
@NavonilDas wrote:

@pombredanne you can find an example at https://registry.npmjs.org/-/npm/v1/security/advisories?page=67 , with module name "electron"

@pombredanne wrote:

let me check: these are not versions but version ranges.
These would need to be resolved to actual concrete versions... The version ranges seem nice as they are more compact BUT they are not great to query things... so we would want to resolve these to actual real version numbers
so as a first step, having a simple function that takes such a version range for an npm and returns a list of real versions is likely a great first step

@NavonilDas wrote:

if I understand correctly, suppose the version range is >=2.0.0 <2.0.8 then the real version should be 2.0.7 and fixed version should be 2.0.8 am I right?

@pombredanne wrote:

well it should be any version that exist and is between 2.0.0 and 2.0.8. you have to take in small steps

parse the version constraints : hint check pip code for this... may be we can borrow code from that as this could be close enough to this (and the || are a way to split too)

once you have these parsed, the resolution in ranges may need to actually fetch the list of known versions for a given npm

not trivial :P
there is possibly some parsers out there too
Search https://registry.npmjs.org/-/npm/v1/security/advisories in code @ Github or searchco.de

See may be https://github.com/search?l=JavaScript&q=https%3A%2F%2Fregistry.npmjs.org%2F-%2Fnpm%2Fv1%2Fsecurity%2Fadvisories&type=Code and

https://github.com/search?l=Java&q=https%3A%2F%2Fregistry.npmjs.org%2F-%2Fnpm%2Fv1%2Fsecurity%2Fadvisories&type=Code

And here are some pointers:

@sbs2001
Copy link
Collaborator

sbs2001 commented Jan 5, 2020

We currently use semantic_version in /scrapers/npm.py it seems fine but I'm not sure whether it supports deb,rpm style versions which include '-'.

@sbs2001
Copy link
Collaborator

sbs2001 commented Jan 6, 2020

A repaste from chat.
After little research on the proposed tools. Here are my findings. About https://github.com/snyk/unified-range I don't think they have an api good enough for our use, they don't provide an easy way to check whether a version belongs to the given version spec.

Regarding https://github.com/repology/libversion is a C program, IMHO they also don't support NPM version specs out of the box

I think you should take a look at https://github.com/python-semver/python-semver they have documentation of supported methods here: https://github.com/python-semver/python-semver/blob/master/semver.py

Personally I think we should continue to use https://github.com/rbarrois/python-semanticversion

@sbs2001
Copy link
Collaborator

sbs2001 commented Jan 16, 2020

https://github.com/dephell/dephell_specifier also seems promising

@pombredanne pombredanne changed the title Create utility for package version handling: parsing, nomalization, comparison Create utility for package version handling: parsing, normalization, comparison Nov 20, 2020
@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 4, 2021

After some experience with dephell, we want to switch to something else mainly because :

  1. It is not maintained
  2. Does not take the context of the ecosystem

I suggest to we create a wrapper library, which dispatches the comparision according to the ecosystem.

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 5, 2021

Here are different specs and the libs which will take care of them

semver :- Used by rust, npm, ruby . We will delegate this to https://github.com/python-semver/python-semver .
python version spec : Delegate to packaging.version
deb version ranges: Delegate to https://github.com/TheClimateCorporation/python-dpkg .
maven: Delegate to https://github.com/dephell/dephell_specifier , Not sure we encounter these anywhere
rpms: https://pagure.io/rpmdevtools/blob/main/f/rpmdev-vercmp
rest will be handled by https://github.com/repology/libversion

@pombredanne
Copy link
Collaborator Author

Some comments:

@pombredanne
Copy link
Collaborator Author

Here is a proposed approach:

  1. we need to define a spec for version specifiers and version ranges:

Universal version specifiers syntax:
<scheme>:<range>,<range>

For instance:
semver:1.2.3,>=2.0.0

With these operators and syntactic elements:

  • Each range is declared this way:

    • "=": Version equality operator. Implied is not present: means version is equal to as in "=1.2.3"
    • "!=": Version exclusion operator. Means version should be excluded "!=1.2.3"
    • "<=", ">=": Inclusive range operator such as "<=1.2.3" which means all versions less than or equal to "1.2.3"
    • "<", ">": Exclusive range operator such as "<1.2.3" which means all versions less than "1.2.3"
  • Multiple ranges can form a larger version ranges specifier separated by a comma
    such as in ">=1.2.3,<2.0.0" which means which means all versions greater than or equal to "1.2.3" but less than "2.0.0"

  • Spaces are not significant and are removed in the canonical form: "!=1.2.3" ans "! = 1.2.3" are equivalent.

  • The ordering of multiple ranges in a specifier is not significant. The canonical ordering is TBD.

  • If a version cannot contains syntax characters (><=!,) they need to be quoted using the URL quoting rules.

The comparison of two versions is scheme-specific. A version scheme covers both
a certain version and version ranges syntax and how two versions are compared.
Schemes are related to Package URL types in the sense that each Package URL
type is related to one version scheme, but multiple types can reuse the same scheme.

The known schemes and their codes are:

  1. we need a basic architecture to handle these.
    Here is an approach:
class BaseVersion:
    # each version value should be comparable e.g. implement functools.total_ordering

    scheme = None
    value = None

    def validate(self):
        """
        Validate that the version is valid for its scheme
        """
        raise NotImplementedError


class GenericVersion:
    scheme = "generic"

    def validate(self):
        """
        Validate that the version is valid for its scheme
        """
        # generic implementation ...
        # use 1. of https://github.com/repology/libversion/blob/master/doc/ALGORITHM.md#core-algorithm
        #  Version is split into separate all-alphabetic or all-numeric components. All other characters are treated as separators. Empty components are not generated.
        #     10.2alpha3..patch.4. → 10, 2, alpha, 3, patch, 4




class DebianVersion:
    scheme = "debian"

    def validate(self):
        """
        Validate that the version is valid for its scheme
        """
        # debian implementation ...


class SemverVersion:
    scheme = "semver"

    def validate(self):
        """
        Validate that the version is valid for its scheme
        """
        # node-semver implementation ...


versions_classes_by_scheme = {
    "generic": GenericVersion,
    "semver": SemverVersion,
    "debian": DebianVersion,
    # ....
}


def parse_version(version):
    """
    Return a Version object from a scheme-prefixed string
    """
    if ':' in version:
        scheme, _, version = version.partition(':')
    else:
        scheme = 'generic'

    cls = versions_classes_by_scheme[scheme]
    return cls(version)


class VersionRange:
    # one of <> >= =< or != or =
    operator = ""
    range = ""

class VersionSpecifier:

    scheme = ""
    ranges = []

    @classmethod
    def from_version_spec_string(cls, value):
        """
        Return a VersionSpecifier built from a version spec string, prefixed by
        a scheme such as "semver:1.2.3,>=2.0.0"
        """
        raise NotImplementedError

    @classmethod
    def from_scheme_version_spec_string(cls, scheme, value):
        """
        Return a VersionSpecifier built from a scheme-specific version spec string and a scheme string.
        """
        raise NotImplementedError

    def __contains__(self, version):
        """
        Return True if this VersionSpecifier contains the ``version``
        Version object or scheme-prefixed version string. A version is contained
        in a VersionSpecifier if it satisfies all its Range.
        """
        if isinstance(version, str):
            version = parse_version(version)

        # .... magic happens here
        raise NotImplementedError

    def __str__(self):
        """
        Return this VersionSpecifier string using a canonical representation and our universal syntax.
        """
        # TODO: sort to make canonic
        ranges = ",".join(self.ranges)
        return f"{self.scheme}:{ranges}"

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 16, 2021

@pombredanne I'm doing the implementation at https://github.com/sbs2001/universal-versions , we can move it to nexB GH org once significant progress is made.

universal-versions is just a placeholder and we can change it alright

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 17, 2021

@pombredanne regarding memory/python-dpkg@1a7c022 . I don't want the whole module since it does many other things. I want to extract the method https://github.com/memory/python-dpkg/blob/1a7c022bb0d4b631241c98e526f88f178c243dcc/pydpkg/__init__.py#L529 and all it's dependencies.

How do I document it's origins and license ?

@pombredanne
Copy link
Collaborator Author

@pombredanne regarding memory/python-dpkg@1a7c022 . I don't want the whole module since it does many other things. I want to extract the method https://github.com/memory/python-dpkg/blob/1a7c022bb0d4b631241c98e526f88f178c243dcc/pydpkg/__init__.py#L529 and all it's dependencies.

How do I document it's origins and license ?

Please use https://github.com/nexB/debian-inspector/ then , which is already a fork of the above https://github.com/nexB/debian-inspector/blob/master/src/debut/version.py

I would rather avoid having multiple forks of the same code in our projects and we can evolve it as needed, so please see if you can use this.

Wrt. the general question of the documentation of origin and license, see https://github.com/nexB/debian-inspector/blob/master/src/debut/version.py.ABOUT and https://github.com/nexB/debian-inspector/blob/master/src/debut/version.py.LICENSE

@pombredanne
Copy link
Collaborator Author

And if you want it in a dedicated library such as "debian-version" that's OK and we could extract it if needed as a shared library

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 18, 2021

@pombredanne I could use https://github.com/nexB/debian-inspector/, but there are 2 issues with it.

  1. It misses a fix for version comparision at memory/python-dpkg@1a7c022
  2. It's pretty heavy, even python-dpkg is heavy. These libs have all the functionality to play with debs, which I don't care about in the context of universal-versions.

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 19, 2021

And if you want it in a dedicated library such as "debian-version" that's OK and we could extract it if needed as a shared library

yes I would like that

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 19, 2021

@pombredanne you're right about rpms,
I missed the fact that the rpm package manager would be required for using https://pagure.io/rpmdevtools/blob/main/f/rpmdev-vercmp

Using https://github.com/sassoftware/python-rpm-vercmp would be least painful for me.

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 19, 2021

Implementing nuget and maven style intervals won't be necessary in the context of vulnerablecode, no data source uses it.

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 19, 2021

In context of vulnerablecode python, semver, deb versions are only needed

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 20, 2021

@pombredanne We don't need to implement maven and nuget style version ranges here. The datasources which provide data about vulnerable maven and nuget packages ie github use simple version ranges.

I need only https://github.com/sassoftware/pymaven/blob/1e69f50dc9ce3ac1ae3980e070f09e2505aa89f7/pymaven/versioning.py#L343 .

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 20, 2021

@pombredanne the issue with https://github.com/nexB/pymaven is the same with debian inspector . I want a piston and I'm importing a truck :)

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 20, 2021

PHP composer most likely uses semver, see https://getcomposer.org/doc/articles/versions.md#testing-version-constraints , it mentions a semver validation utility .

See https://github.com/composer/semver , and composer is deendent on it see https://packagist.org/packages/composer/semver/dependents?order_by=downloads

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 26, 2021

@sbs2001
Copy link
Collaborator

sbs2001 commented Apr 2, 2021

@sbs2001
Copy link
Collaborator

sbs2001 commented Apr 26, 2021

Fixed via #421 and we have new utility https://github.com/sbs2001/univers.

We don't do normalization atm, because all version comparision we do is between packages with same name, type etc(not same version though :) )

@sbs2001 sbs2001 closed this as completed Apr 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants