Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the codemeta metadata block to add some more structure for machine actionability #11087

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

doigl
Copy link
Contributor

@doigl doigl commented Dec 11, 2024

What this PR does / why we need it:
Actually, the fields MemoryRequirements and ProcessorRequirements and StorageRequirements are just free text fields, what makes it difficult to use them in an automated process to provide the right resources for running a jupyter notebook or a container. Adding subfields to these fields with controlled vocabularies would it make it easier to differentiate between different types and identify the right amount of resources like memory.

In addition to these changes, this pull request also adds new subfields for softwareRequirements and softwareSuggestions to distinguish between InfoUrl (documentation page) and URL (DownloadURL) for dependencies, and adjusts the termURI of contIntegration to contiousIntegration (codemeta v3.0)

Which issue(s) this PR closes:

Special notes for your reviewer:

This pull requests only changes codemeta.tsv file. As it introduces new subfields for the metadata fields storageRequirements and memoryRequirements (that were simple fields and no compound fields before), existing metadata in these fields have to be migrated (manually?).

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Metadata form:
grafik

Rendered metadata:
grafik-1

Is there a release notes update needed for this change?:
Yes, this should be mentioned in the release notes- if applied - together with a description how to migrate existing metadata in the changed fields.

Additional documentation:

…d storageRequirements, added new subfield for softwareRequirements and softwareSuggestions to distinguish between InfoUrl (documentation page) and URL (DownloadURL) for dependencies, adjusted termURI of contIntegration to codemeta v3.0 in codemeta.tsv

What this PR does / why we need it:

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

…d storageRequirements, added new subfield for softwareRequirements and softwareSuggestions to distinguish between InfoUrl (documentation page) and URL (DownloadURL) for dependencies, adjusted termURI of contIntegration to codemeta v3.0 in codemeta.tsv
@ofahimIQSS ofahimIQSS added the Size: 3 A percentage of a sprint. 2.1 hours. label Jan 28, 2025
@cmbz cmbz added FY25 Sprint 15 FY25 Sprint 15 (2025-01-15 - 2025-01-29) FY25 Sprint 16 FY25 Sprint 16 (2025-01-29 - 2025-02-12) labels Jan 29, 2025
@pdurbin pdurbin self-assigned this Jan 30, 2025
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks like a nice incremental improvement. @doigl I'm leaving some specific feedback below. Thanks!

codeMeta20 Software Metadata (CodeMeta v2.0) https://codemeta.github.io/terms/
codeMeta20 Software Metadata (CodeMeta 2.0) https://codemeta.github.io/terms/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Should this be 3.0 instead of 2.0? Or are we not there yet?

Also, is it important to remove the "v"? In c5a6a8f @poikilotherm added a "v" to src/main/java/propertyFiles/codeMeta20.properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is addressed, thanks for pointing out. The name of the metadatablock is still codeMeta20, but the displayName is now "Software Metadata (CodeMeta v3.0)"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the display name.

But what about the name? Is the plan to stick with "codeMeta20" forever? Or will we someday switch to "codeMeta30" or "codeMeta40"? Should we consider shortening it to just "codeMeta" (or "codemeta")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favour of renaming the block to codemeta, but then, we really need detailed upgrade instructions to avoid a dublication of the metadatablock
@poikilotherm what was the original reason to include the version into the block name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just some screenshots as of 28ff8ed for my reference or others interested in the PR. Overall, the fields look good to me.

displayOnCreate fields

Screenshot 2025-01-30 at 11-53-30 Add New Dataset - Root

all fields

Screenshot 2025-01-30 at 11-53-07 pyDataverse - Root

scripts/api/data/metadatablocks/codemeta.tsv Show resolved Hide resolved
@@ -1,5 +1,5 @@
#metadataBlock name dataverseAlias displayName blockURI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just going to add this comment here at the top, but yes, a release note should be added. Please see https://guides.dataverse.org/en/6.5/developers/version-control.html#writing-release-note-snippets

@doigl I noticed you wrote this: "As it introduces new subfields for the metadata fields storageRequirements and memoryRequirements (that were simple fields and no compound fields before), existing metadata in these fields have to be migrated (manually?)."

Do you plan to provide an SQL upgrade script? If not, perhaps tell people they are on their own since the block is experimental? 🤔 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdurbin I added a release note, but so far not with an upgrade script. This SQL-statement
select dvo.identifier, dt.name as name, dfv.value as val from datasetfield as df, datasetfieldtype as dt, datasetfieldvalue as dfv, dvobject as dvo, datasetversion as dv where df.id = dfv.datasetfield_id and df.datasetfieldtype_id = dt.id and dvo.id = dv.dataset_id and df.datasetversion_id = dv.id and name IN ('memoryRequirements', 'storageRequirements');
identifies the datasets with values in memoryRequirements and storageRequirements and the following:
select upper(substring(dfv.value from '[kmgtpKMGTP][Bb]')) as unit, substring (dfv.value from '\d{1,4}') as numb_val, upper(substring(dfv.value from 'RAM|Ram|ram|GPU|Gpu|gpu|NPU|Npu|npu')) as ramtype, dvo.identifier, dt.name as name, dfv.value as val from datasetfield as df, datasetfieldtype as dt, datasetfieldvalue as dfv, dvobject as dvo, datasetversion as dv where df.id = dfv.datasetfield_id and df.datasetfieldtype_id = dt.id and dvo.id = dv.dataset_id and df.datasetversion_id = dv.id and name IN ('memoryRequirements', 'storageRequirements'); extracts the information for the subfields.

But I'm a bit hesistant (and perhaps just not experienced enough to really dare), to try to automatically generate the insert statements for the subfields. For our installation I would perhaps rather try to add theses by a script using the API.

What do you mean? Have you had similar changes in metadata blocks before and a good way how to handle this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand your hesitancy. I just made this PR to suggest some changes to the release note:

I like the idea of at least showing people which datasets are affected, so I copied that part of the SQL into the note. Actually, I just realized there are two. Maybe you can add that?

Copy link
Member

@pdurbin pdurbin Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doigl I see you incorporated ideas form the PR 25 into 415be06 so I'll resolve this and move the PR into "ready for QA". Thanks!

pdurbin added a commit that referenced this pull request Feb 11, 2025
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more feedback.

@@ -1,5 +1,5 @@
#metadataBlock name dataverseAlias displayName blockURI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand your hesitancy. I just made this PR to suggest some changes to the release note:

I like the idea of at least showing people which datasets are affected, so I copied that part of the SQL into the note. Actually, I just realized there are two. Maybe you can add that?

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done much testing but I think this is ready for QA. Approved.

@cmbz cmbz added the FY25 Sprint 17 FY25 Sprint 17 (2025-02-12 - 2025-02-26) label Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 15 FY25 Sprint 15 (2025-01-15 - 2025-01-29) FY25 Sprint 16 FY25 Sprint 16 (2025-01-29 - 2025-02-12) FY25 Sprint 17 FY25 Sprint 17 (2025-02-12 - 2025-02-26) Size: 3 A percentage of a sprint. 2.1 hours.
Projects
Status: Ready for QA ⏩
Development

Successfully merging this pull request may close these issues.

Suggestion: Update the CodeMeta Metadata Block to add some more structure for machine actionability
4 participants