Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] option to generate reproducible output #227

Closed
RodneyRichardson opened this issue May 24, 2022 · 9 comments
Closed

[FEATURE] option to generate reproducible output #227

RodneyRichardson opened this issue May 24, 2022 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@RodneyRichardson
Copy link
Contributor

I would like the tool to create exactly the same output if I run it on the same (Pipfile.lock) input file twice. This would make it easier to detect changes over time.

There are several places where the outputs differ:

  1. The bom-ref is a GUID, newly generated on each run. This could be the purl (as cyclonedx-dotnet appears to do).
  2. The order of externalReferences is not maintained.
  3. The order of components/libraries is not maintained.
@jkowalleck
Copy link
Member

Pull requests are welcome.

Please review the existing tests, as they are intended to build identical results already.

@RodneyRichardson
Copy link
Contributor Author

@jkowalleck: Do you have a preference for whether the model uses sorted sets, or if the output classes sort while generating output?

@RodneyRichardson
Copy link
Contributor Author

RodneyRichardson commented May 24, 2022

FYI: The tests understandably ignore a few things when comparing outputs, namely the timestamp, serialNumber, ordering (JSON sorts before comparing, XML does XML comparison), and tool version.

@jkowalleck
Copy link
Member

jkowalleck commented May 26, 2022

my experience from other CycloneDX tools that have a feature to generate reproducible results, the following should be requirement/acceptance-criteria

  • XML element order must stay valid to the XSD
    • e.g. valid: <bom><components/><dependencies/></bom>
    • e.g. invalid: <bom><dependencies/><components/></bom>
  • all lists/sets are sorted (do not care what criteria for sorting is used, as long as sorting is deterministic/consistent)
  • serialized bom.metadata.timestamp is omitted/empty
  • bom-ref from bom.metadata.component must be empty or consistent - ala "Component.{self.group}.{self.name}.{self.version}"
  • bom-ref from bom.components.* must be empty or consistent - ala "Component.{self.group}.{self.name}.{self.version}"
  • bom-ref from bom.services.* must be empty or consistent - ala "Service.{self.group}.{self.name}.{self.version}"
  • regarding sorted lists:
  • if a bom-ref is auto-generated to produce a dependency-graph (bom.dependencies), the the list must use consistent identifiers in a sorted way
  • JSON properties must be sorted (alphabetically)
  • mutating data on normalization/serialization should not happen.
  • there is no intention to change internal data structure. But sorted() can be called on normalization. You plan to use SortedSet from sortedcontainers?
  • SortedDict from sortedcontainers for the JSON output might help. or have a JSON-serializer that does the sorting for you

Making output consistent is a non-standard feature, since it takes extra effort.
The feature could be enabled via an environment variable or a parameter to a normalization method.

here is an example from the JavaScript implementation that got a reproducibility-feature lately:

@RodneyRichardson
Copy link
Contributor Author

I've had a look at various ways of doing this by sorting the output, and think I've settled on (initially) replacing set with SortedSet from sortedcontainers. This will get us part of the way there.

I won't look to check BomRef yet - but I won't use it in sort criteria, either.

This won't change the JSON property order either - or XML element order - but I believe it should be consistent.

@jkowalleck
Copy link
Member

jkowalleck commented May 26, 2022

bom-refs are the key to bom.dependency - aka dependency graph.
its one of the many parts of the output. you might stumble upon it, when you refactor the serializers/normalizers.

according to the CycloneDX spec, the bom-ref can be any string, as long as it is unique over all bom-refs in one document. (the xml xsd even has enforcement for uniqueness)

nevertheless, give it a try and see if you can hack the feature in :)

@jkowalleck
Copy link
Member

one adddition: Output must be valid according to the used spec. see https://github.com/CycloneDX/specification/tree/master/schema
there is a internal json schema validator in place that should be used during testing, already. this will help you check that the output format is valid, still.
i think the xml schema validator is not finished, yet. you might want to use an external xsd validator, then.

@jkowalleck jkowalleck added the enhancement New feature or request label May 26, 2022
@jkowalleck jkowalleck changed the title Output is different on consecutive runs [FEATURE] option to generate reproducible output May 26, 2022
@RodneyRichardson
Copy link
Contributor Author

I think this can be closed now, as #235 is merged.

@jkowalleck
Copy link
Member

jkowalleck commented Jun 10, 2022

re: #227 (comment)
Feel free to cloes your issue, @RodneyRichardson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants