Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISCUSSION: SPDX 3 data model #240

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
module github.com/spdx/tools-golang

go 1.13
go 1.18

require (
github.com/anchore/go-struct-converter v0.0.0-20221118182256-c68fdcfa2092
Expand Down
53 changes: 53 additions & 0 deletions spdx/v3/v3_0/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# SPDX 3 Model Patterns

This is a non-exhaustive list of some patterns we are able to use in Golang to work with an SPDX 3 data model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you sooo much for putting this level of detail and even mocking out all the examples! You've made it super understandable and easy to review!


I've included some variations I think we should consider. An explanation is as follows, along with some pros/cons:

- [embedded first](embedded_first) - lean into modifying go structs directly with exported fields, using struct embedding.
This requires using interfaces, but the interfaces are simple single-functions to get an editable embedded struct.
- pro: interfaces are simple
- pro: modifying structs is simple and idiomatic
- pro: minimal surface area: very few additional functions necessary
- con: construction is confusing -- which nesting has field X?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea this is something i really dislike about embedding as well, the constructors are very non-intuitive.. especially to a user who does not really have to the inheritance relationships to actually use SPDX.

- con: naming is hard, need `Package` and `IPackage`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an interesting dimension to consider also is addition on new fields and compatibility between versions, i think each of them may have pros/cons in this regard as well. (like in interfaces we can just have the v3.1 interface embed the v3.0 interface and make backward compatibility simple.



- [interfaces only](interfaces_only) - only export interfaces and use interfaces to interact with structs.
NOTE: this also includes a variant that setters return `error`, which could be applied to other options.
This option only exports the interface types and simple no-arg constructor functions.
- pro: idiomatic getter naming (without `Get`)
- pro: single type exported, data structs are not
- pro: minimal surface area: no extra helper functions/structs that we may decide to change later
- con: absolutely no JSON/etc. output
- con: getter/setter still not especially idiomatic
- con: duplicated code because no embedding
- con: construction is tedious and could be error-prone if referencing wrong variable name


- [interfaces only with constructors](interfaces_only/constructors)
- (generally same pros/cons as above), but:
- pro: much simpler construction pattern for users
- pro: single validation spot during construction (with returning `error` pattern)
- con: more structs and functions exported which cannot be changed without breaking backwards compatibility


- [inverted](inverted) - since we know the entire data model and it does not need to be extended,
it is possible to implement an uber-struct without the need to use interfaces at all.
I thought I'd mention this for some semblance of completeness, but this is probably a very bad idea.
- pro: single set of types exported
- pro: fairly easy to interact with existing documents
- con: easy to construct incorrectly
- con: potential unneecssary memory usage


- [embedded with interfaces and constructors](embedded_with_interfaces_and_constructors) - takes the approach of both
embedding and interfaces, along with constructor functions. This makes creation reasonably simple and working with
existing documents reasonably simple.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this one the best! i think there is one aspect which i'm not sure what the solution to is (which may be generic to anything that is implemented with interfaces and not just to this one)... when we have a field like a slice of elements, like To []Element. Once we get hold of an Element, how do we get back to the child class of the Element? Like a Package is an Element but when we get a hold of the Package as an Element in a relationship, how do we know its a Package?

- pro: probably the most idiomatic Go of these options
- pro: simple construction
- pro: reasonably simple to work with existing documents
- pro: eliminates some code duplication seen with `interfaces_only`
- con: getter/setter still not especially idiomatic
- con: larger surface area: at least 4 exported parts for each type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... yea i wonder if there's anyhting we can do with getters/setters here with reflection...

why do we need to export the internal struct again? can you remind me?

- con: absolutely no JSON/etc. output
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure what you mean by this, is it just because the fields are not public thus no serialized JSON output?

161 changes: 161 additions & 0 deletions spdx/v3/v3_0/embedded_first/model.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
package v3_0

import "time"

type ElementID string

// ----------------------- SpdxDocument -----------------------

type SpdxDocument struct {
ElementCollection

NamespaceMap map[string]string
}

type ISpdxDocument interface {
IElementCollection
}

// ----------------------- ElementCollection -----------------------

type ElementCollection struct {
Element

RootElements []IElement
Elements []IElement
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem with this implementation is that RootElement is not class Element but any class that inherits from Element. In Go, I guess, that translates to "any struct that implements the Element interface".

Copy link
Collaborator Author

@kzantow kzantow May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in this variant, the I* types are all interfaces -- so these are "any types implementing the IElement interface". These could be named differently... naming is hard! there doesn't seem to be a way to avoid interfaces unless we made any reference field an any /interface{} type, but then there's no compiler enforcement of elements going in there. (Also I probably modeled this wrong, I was going by the software profile svg which appears to have * for both of these fields).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think personally I'm leaning towards the embedded_with_interfaces_and_constructors variant, with setters that return error, as the example currently shows. This allows generated code to enforce valid values for all fields and for a user to most quickly understand when things failed rather than needing to run a full validation once all values are set. It also eliminates the need to export the data structs (all the embedded variants need exported structs)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my bad: the property is of type IElement

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work. It would be easier to generate:

package main

import "fmt"

type ElementP struct {
	Name string
}

type Element interface {
	AsElement() ElementP
}

type Artifact struct {
	ElementP
	StandardName []string
}

func (a Artifact) AsElement() ElementP {
	return a.ElementP
}

type ElementCollection struct {
	ElementP
	RootElement Element
}

func main() {
	a := Artifact{
		ElementP: ElementP{
			Name: "AnArtifact",
		},
		StandardName: []string{"TheArtifact"},
	}

	ec := ElementCollection{
		ElementP: ElementP{
			Name: "AnElementCollection",
		},
		RootElement: a,
	}

	fmt.Println(ec)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more issue: The jinja renderer I am using from the shacl2code project can only generate one file. So if we were to generate a model and a constructor file, we would need to roll our own jinja renderer...which I would rather do once I can handle the code better. You are welcome to try building one that will generate multiple files.

Copy link
Collaborator Author

@kzantow kzantow May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note: there is no requirement to have separate files -- I just did this to separate things logically, but all the code for each of these approaches could be generated in a single file. In other words: if we were to export all 4 types: interface, data struct, constructor, constructor props; all of these could be one after the other in the same file.

From the "generic shacl2code" perspective, I think in the example there's a bit of an inconsistency with the example you posted: only a single struct type gets renamed (ElementP) for an Element interface, but there are no other interfaces created. I think the easiest thing to do in shacl2code is probably: generate an exported interface and a struct for each type. It also probably makes the most sense to name the interface after the shacl type rather than the struct, as one of my examples has. Abstract types are a question: golang would let you indiscriminately construct any abstract type if these structs are exported, and they would need to be generated and exported to be used like superclasses through embedding.

The interfaces_only approach has the potential to model inheritance defined in shacl in the most correct way, I believe -- because abstract classes can omit constructor functions and since the types themselves are not exported, there's no way for a user to create these types incorrectly. It has some complexity that at the time you're generating code for a type, it needs to have access to supertypes to be able to inline all the properties -- a quick look makes me think this wouldn't be too hard to accomplish, since it looks like we can get a reference to a parent class and jinja supports recursion.

An aside: naming is hard! Is the P from ElementP short for Props or Properties? I am sorta partial to Data for these structs despite using Impl in one of the variants, but also like Props, at least for the constructor props types, or other things; happy for any other suggestions but would definitely like to keep it short but maybe more descriptive than a single letter. Interfaces only allows the lowercase name to be used, since it's not exported and we don't have to come up with names, so I guess there's another plus for that one... except for package, which is a reserved word 🤦

Copy link
Collaborator Author

@kzantow kzantow May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an "embedded_second" variant, which I think is similar to what you were suggesting (replace Data with P or some other suffix TBD and the interfaces are the first-class names of the types); I just wanted to point out specifically what the usage would look like, which could be just fine: https://github.com/spdx/tools-golang/pull/240/files#diff-55bf8e9ab2432ee525685cd1e7ef75be715a7e18c4eef3011711b89af64bc389R14

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with you on going with the interface only approach i.e. Abstract class -> Go Interface with getter and setter for each property. Concrete class -> Go struct which implements the getter and setter methods for the abstract class.

I personally would prefer having corresponding internal structs to hold common properties in the concrete classes, but it's not a deal breaker. We have to embed the interfaces anyway (Artifact is actually an abstract class that inherits from Element).

}

type IElementCollection interface {
IElement

AsElementCollection() *ElementCollection
}

// ----------------------- Element -----------------------

type Element struct {
SpdxID ElementID
Name string
}

func (e *Element) AsElement() *Element {
return e
}

type IElement interface {
AsElement() *Element
}

var _ interface {
IElement
} = (*Element)(nil)

// ----------------------- Artifact -----------------------

type Artifact struct {
Element

BuiltTime time.Time
}

func (a *Artifact) AsArtifact() *Artifact {
return a
}

type IArtifact interface {
IElement

AsArtifact() *Artifact
}

var _ interface {
IArtifact
} = (*Artifact)(nil)

// ----------------------- SoftwareArtifact -----------------------

type SoftwareArtifact struct {
Artifact

CopyrightText string
}

type ISoftwareArtifact interface {
IArtifact

AsSoftwareArtifact() *SoftwareArtifact
}

func (a *SoftwareArtifact) AsSoftwareArtifact() *SoftwareArtifact {
return a
}

var _ interface {
ISoftwareArtifact
} = (*SoftwareArtifact)(nil)

// ----------------------- Package -----------------------

type IPackage interface {
ISoftwareArtifact

AsPackage() *Package
}

type Package struct {
SoftwareArtifact

PackageVersion string
}

func (p *Package) AsPackage() *Package {
return p
}

var _ interface {
IPackage
} = (*Package)(nil)

// ----------------------- File -----------------------

type IFile interface {
ISoftwareArtifact

AsFile() *File
}

type File struct {
SoftwareArtifact

ContentType string
}

func (f *File) AsFile() *File {
return f
}

var _ IPackage = (*Package)(nil)

// ----------------------- Relationship -----------------------

type RelationshipType string

type Relationship struct {
Element

RelationshipType RelationshipType
From IElement
To []IElement
}

type IRelationship interface {
IElement

AsRelationship() *Relationship
}

func (r *Relationship) AsRelationship() *Relationship {
return r
}

var _ IRelationship = (*Relationship)(nil)
70 changes: 70 additions & 0 deletions spdx/v3/v3_0/embedded_first/model_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
package v3_0

import (
"fmt"
"testing"
)

func Test_makeAnSpdxDocument(t *testing.T) {
// creating new documents: 2 packages found from 1 file with 2 relationships

// embedded nesting is not ideal when instantiating:
pkg1 := &Package{
SoftwareArtifact: SoftwareArtifact{
Artifact: Artifact{
Element: Element{
SpdxID: "pkg-1",
Name: "pkg-1",
},
},
},
PackageVersion: "1.0.0",
}

// can just reference properties directly, once an object exists:
pkg2 := &Package{}
pkg2.SpdxID = "package-2"
pkg2.Name = "package-2"
pkg2.PackageVersion = "2.0.0"

file1 := &File{}
file1.SpdxID = "file-1"
file1.Name = "file-1"
file1.ContentType = "text/plain"

file1containsPkg1 := &Relationship{
RelationshipType: "CONTAINS",
From: file1,
To: []IElement{pkg1},
}

pkg1dependsOnFile1 := &Relationship{
RelationshipType: "DEPENDS_ON",
From: pkg1,
To: []IElement{pkg2},
}

doc := SpdxDocument{
ElementCollection: ElementCollection{
Element: Element{
SpdxID: "spdx-document",
},
Elements: []IElement{
pkg1,
pkg2,
pkg1dependsOnFile1,
file1containsPkg1,
},
},
}
fmt.Printf("%#v\n", doc)

// working with existing documents

for _, e := range doc.Elements {
if e, ok := e.(IPackage); ok {
e.AsPackage().Name = "updated-name"
}
}
fmt.Printf("%#v\n", doc)
}
Loading
Loading