Remap oai_dc fields dc:type and dc:date #10737

pdurbin · 2024-08-01T18:02:01Z

What this PR does / why we need it:

The oai_dc export and harvesting format has had the following fields remapped:

dc:type was mapped to the field "Kind of Data". Now it is hard-coded to the word "Dataset".
dc:date was mapped to the field "Production Date" when available and otherwise to "Publication Date". Now it is mapped the field “Publication Date” or the field used for the citation date, if set (see Set Citation Date Field Type for a Dataset).
~~dc:rights was not mapped to anything. Now it is mapped (when available) to terms of use, restrictions, and license.~~ Deferred until this issue:
- Access Rights metadata in OpenAIRE metadata export is being misapplied #5920

As these are backward incompatible changes, they have been emphasized in the release note snippet.

Which issue(s) this PR closes:

Closes Change Dataverse / Dublin Core mapping to improve OAI-PMH harvesting #8129

Special notes for your reviewer:

Should these backward-incompatible changes be hidden behind a feature flag?

Suggestions on how to test this:

See rules above under "what this PR does". Also, below are some examples of before and after.

Before

dc:date is mapped to the field "Production Date" when available and otherwise to "Publication Date".
We see "survey" under dc:type because it was entered in the "Kind of Data" field. dc:type will be absent if "Kind of Data" isn't filled in.

<oai_dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:title>Darwin's Finches</dc:title>
  <dc:identifier>https://doi.org/10.5072/FK2/QHIUBQ</dc:identifier>
  <dc:creator>Finch, Fiona</dc:creator>
  <dc:publisher>Root</dc:publisher>
  <dc:description>Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.</dc:description>
  <dc:subject>Medicine, Health and Life Sciences</dc:subject>
  <dc:date>2024-09-10</dc:date>
  <dc:type>survey</dc:type>
</oai_dc:dc>

After

dc:date is mapped the field "Publication Date" or the field used for the citation date, if set (see Set Citation Date Field Type for a Dataset).
dc:type is hard-coded to "Dataset".

<oai_dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:title>Darwin's Finches</dc:title>
  <dc:identifier>https://doi.org/10.5072/FK2/QHIUBQ</dc:identifier>
  <dc:creator>Finch, Fiona</dc:creator>
  <dc:publisher>Root</dc:publisher>
  <dc:description>Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.</dc:description>
  <dc:subject>Medicine, Health and Life Sciences</dc:subject>
  <dc:date>2024-09-10</dc:date>
  <dc:type>Dataset</dc:type>
</oai_dc:dc>

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No.

Is there a release notes update needed for this change?:

Yes, included.

Additional documentation:

None.

The `oai_dc` export and harvesting format has had the following fields remapped: - dc:type was mapped to the field "Kind of Data". Now it is hard-coded to the word "Dataset". - dc:date was mapped to the field "Production Date" when available and otherwise to "Publication Date". Now it is mapped only to the field "Publication Date". - dc:rights was not mapped to anything. Now it is mapped (when available) to terms of use, restrictions, and license.

pdurbin · 2024-08-01T18:06:31Z

@jggautier heads up that this relates to this issue in that we are now adding "dc:rights" to the oai_dc export/harvesting format:

Feature Request/Idea: Make sure that for datasets published with CC0 waiver or standard license, metadata exports include the waiver or license #8798

pdurbin · 2024-08-01T18:08:28Z

@tcoupin I'm requesting a review from you because you modified the dc:date login in the following pull request and I changed it (as explained above):

oai_dc export: use publication date if production date is not set #8733

qqmyers · 2024-08-01T18:09:34Z

Re: dc:date - should it be mapped to the same field as https://guides.dataverse.org/en/latest/api/native-api.html#set-citation-date-field-type-for-a-dataset ? That is publicationDate by default.

coveralls · 2024-08-01T18:14:50Z

coverage: 20.735%. remained the same
when pulling 01e266c on 8129-harvesting
into 4143031 on develop.

pdurbin · 2024-08-01T20:02:43Z

@qqmyers well, Publication Date is what @philippconzett asked for in the issue (#8129).

qqmyers · 2024-08-01T20:11:46Z

@philippconzett 's notes also point out that this date potentially going to be interpreted as the citation date. Since we allow configuring that in the local installation, it seems like it could be confusing to hardcode it for harvesting. If the harvester used the field from that setting, citations would be consistent in the local display and harvesting sites, and it would default to publicationDate as requested in the issue.

pdurbin · 2024-08-01T21:03:25Z

I don't have a strong opinion about it.

philippconzett · 2024-08-02T03:23:31Z

I think @qqmyers's suggestion for dc:date makes sense.

plecor · 2024-08-02T09:06:26Z

I've taken @tcoupin's role on Dataverse issues, so I am looking at this for him.

Part of the context for the change he implemented (mapping dc:date to Publication Date if Production Date is empty) was that when Dataverse harvests another OAI-PMH repo, dc:date is mapped to productionDate and this production date is then used in the citation of the harvested dataset. #8733 and #8732 were both part of an effort to guarantee the coherence between citation dates when harvesting another Dataverse.

So I agree with @qqmyers's suggestion on dc:date.

Hardcoding dc:type to Dataset would certainly simplify things. In practice, I know some Dataverse instances allow for kindOfData values that are not synonyms of 'Dataset'. For instance : https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=doi:10.57745/DZIM2L

There might be an alternative solution where there is always at least a dc:type tag with value Dataset and the list of kindOfData values (making sure that Dataset occurs only once)? That would however means that not all dc:type values come from a controlled vocabulary.

pdurbin · 2024-08-02T19:01:17Z

@plecor thanks. One thing to consider with "dc:type" is that types other than datasets (like software and workflows) are coming...

dataset types (software, workflow, etc.) - initial support #10694

... so maybe we can revisit "dc:type" once that pull request is merged.

To all, I pushed some tests to exercise export and setting the citation date.

Now I'm trying to see if there's a small change I can make to DublinCoreExportUtil to get the citation date out.

I can get just the year (YYYY) with code like this...

String citation = version.getCitation();
// We're looking for ", YYYY, " in a citation like this:
// Finch, Fiona, 1999, "Darwin's Finches", https://doi.org/10.5072/FK2/WSSYBE, Root, V1
Pattern pattern = Pattern.compile(", (\\d{4}), ");
Matcher matcher = pattern.matcher(citation);
matcher.find();
String yearInCitation = matcher.group(1);
writeFullElement(xmlw, dcFlavor+":"+"date", yearInCitation);

... but I need the full YYYY-MM-DD version to put in the the oai_dc output. 🤔 If you have any ideas for me, please let me know.

pdurbin · 2024-08-02T20:25:17Z

I dug a little more and our citation code is focused on returning just a 4 digit year for the date. This would be a change from what we do now (YYYY-MM-DD)

@philippconzett @plecor @qqmyers what do you think? Should we change oai_dc to YYYY for dc:date?

The spec Philipp found seems to say it's ok. Check out the year 1650 as an example at https://www.base-search.net/about/en/faq_oai.php#dc-date

qqmyers · 2024-08-02T21:46:58Z

Since the citationDateFieldType is part of the Dataset, I'd think at some point it could/should be part of the DatasetDTO and JSON export, thereby being available to other exporters (will the SPA or other client need this info (in the JSON returned from the dataset api) at some point?).

If that's too much for now, I think the idea of parsing it from the citation as YYYY makes sense, assuming that's sufficient for how people want to use that field. Alternately, I think you could 'go around' the exporter SPI interface and get the full value directly pretty easily as well, e.g. with something like:

        DatasetFieldType citationDataType = jakarta.enterprise.inject.spi.CDI.current().select(DatasetServiceBean.class).get().findByGlobalId(globalId.asString()).getCitationDateDatasetFieldType();
        if(citationDataType!= null) {
            date = dto2Primitive(version, citationDataType.getName());
        } else {
            date = datasetDto.getPublicationDate();
        }

This would not be the only current exporter doing that (e.g. the DDI exporter grabs the ExportInstallationAsDistributorOnlyWhenNotSet Setting it needs).

pdurbin · 2024-08-05T13:56:40Z

Alternately, I think you could 'go around' the exporter SPI interface and get the full value directly pretty easily as well, e.g. with something like...

@qqmyers I gave this a try but citationDataType.getName() yields a four digit year (YYYY) so it's no better than getting (just) the year from the citation with the regex I showed above.

qqmyers · 2024-08-05T14:57:10Z

Are you thinking of

dataverse/src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java

Line 1468 in ff26ae8

public Date getCitationDate() {

? citationDataType.getName() should get the name of the field type and then dto2Primitive should get the full value from the field itself in the metadatablock, which I think should always be the full date (unless there's a field that is YYYY only?).

philippconzett · 2024-08-28T16:10:52Z

@jggautier Thanks for clarifying. Is there a way to figure out what information we deliver to harvesters like BASE Bielefeld? See the initial post in #8129.

jggautier · 2024-08-28T20:28:48Z

@philippconzett, do you mean if there's a way to figure out what we currently deliver to harvesters like BASE Bielefeld or what we should deliver? Sorry if that sounds like a dumb question lol. I've been focused a lot on how decisions are made as well as what decisions are made, which maybe led to me overthink your question!

If it's currently, you wrote in #8129 about what we currently deliver when it comes to rights metadata, which is that we don't deliver anything since Dataverse doesn't provide the field dc:rights. I can say that nothing's changed about this since you opened #8129 a few years ago. So when I asked if it's appropriate that this PR being merged will close #8129, and that there's discussion in that GitHub issue that isn't addressed, I was mostly thinking about your comments related to dc:rights and degrees of open access.

For what we should start delivering to harvesters like BASE Bielefeld, you mentioned the guidance at https://www.base-search.net/about/en/faq_oai.php#dc-rights, which recommends the two vocabularies you pointed to earlier today: the info-eu-repo-Access-Rights vocabulary and the COAR-Access-Rights vocabulary.

I think it'll be helpful to consider what I wrote in #5920, where I wrote about what we've learned and challenges about how the info-eu-repo-Access-Rights vocabulary is being included in the OpenAIRE exports that Dataverse creates.

This might be a matter of scoping and timing, too, right? We could create a new GitHub issue specifically about the use of the info-eu-repo-Access-Rights and COAR-Access-Rights vocabularies, that mentions what's discussed in #8129. So when #8129 is closed because this PR is merged, the unaddressed goals you mentioned in #8129 aren't lost and there's a place where the community can focus on how to address those goals.

And if we can learn how @pdurbin and others made the decisions in this PR about what goes into dc:rights, it'll be easier to think about how effective those decisions are.

philippconzett · 2024-08-30T05:46:55Z

Hi @jggautier! Thanks for disentangling this! I wasn't aware of #5920, but have read up about it now and added some comments there. I don't really know what this means for #8129. Maybe a temporary solution to make dataset metadata from Dataverse more visible in OpenAIRE could be a kind of "inverted" and slightly adapted version of what you described in #5920:

openAccess: If any files are set to non-restricted, the metadata export uses "openAccess".
restrictedAccess: If all of the files in the dataset are set to restricted and the option to request. access is enabled (people are allowed to request access using Dataverse's request access feature), the metadata export uses "restrictedAccess".
closedAccess: If all of the files in the dataset are set to restricted and the option to request access is disabled, the metadata export uses "closedAccess".
embargoedAccess: If all of the files in the dataset are set to embargoed, the metadata export uses "embargoedAccess".

pdurbin · 2024-09-04T20:33:10Z

could you write about how these mapping decisions were made?

@jggautier back in 59850ce these lines were added as part of PR #3308:

writeFullElement(xmlw, "dcterms:license", version.getLicense());
writeFullElement(xmlw, "dcterms:rights", version.getTermsOfUse());
writeFullElement(xmlw, "dcterms:rights", version.getRestrictions());

This was for Dataverse 4.5 ( https://github.com/IQSS/dataverse/releases/tag/v4.5 ) when harvesting was first introduced (in 4.x). It looks like the code was mostly worked on by @landreev and @sekmiller. In short, the logic has been here for 8 years and I can't find any comment on why we do it this way. @jggautier what you're showing in that screenshot is a reflection of the same, unchanged logic.

The code above is for the "dcterms" flavor of Dublin Core. In this current PR, I copied the logic above for the "dc" flavor. I hope this helps!

jggautier · 2024-09-06T15:30:00Z

Ah, thanks @pdurbin! I updated the Dataverse crosswalk to reflect this, specifically the part of the crosswalk showing that what's entered in the Restrictions field is included in the DC Terms export, as dcterms:rights like you wrote. The crosswalk used to indicate this, and for some reason I don't remember now, in 2022 I edited it to read that it was "(Not mapped)".

To learn more about why the predefined license, Terms of Use, and Restrictions are included in the DC Terms export, I tried to find the Functional Requirements Document mentioned in that PR you linked to. The Functional Requirements Document folder in our Google Drive has some info, but I haven't seen any FRDs so far that go into enough detail.

I think we should:

Remove from this PR decisions about what's added to dc:rights and make sure that this PR does not close the issue at Change Dataverse / Dublin Core mapping to improve OAI-PMH harvesting #8129 or
Agree that what this PR does is okay for now only because it matches what's included in the DC Terms metadata exports, make sure that this PR as designed right now does not close the issue at Change Dataverse / Dublin Core mapping to improve OAI-PMH harvesting #8129, and update Change Dataverse / Dublin Core mapping to improve OAI-PMH harvesting #8129 to reflect what's been decided in this PR or
Move this PR from "In Review" to "In Progress" so that research can be done to determine how information about the licenses and terms metadata of deposits should be included in the Dublin Core metadata that Dataverse exports, including tackling what's discussed in Change Dataverse / Dublin Core mapping to improve OAI-PMH harvesting #8129

pdurbin · 2024-09-09T20:50:42Z

As discussed in Slack and elsewhere:

I backed out any changes having to do with dc:rights and added a large font note to the top of Change Dataverse / Dublin Core mapping to improve OAI-PMH harvesting #8129 (which this PR closes) to look to Access Rights metadata in OpenAIRE metadata export is being misapplied #5920 instead. @cmbz @scolapasta I'm not sure where you want Access Rights metadata in OpenAIRE metadata export is being misapplied #5920 in terms of priorities.
I made some doc changes suggested by @landreev

@landreev I'm going to unassign myself but please let me know if you'd like me to jump back on this branch and do any additional coding or testing!

github-actions · 2024-09-09T20:54:21Z

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:8129-harvesting

ghcr.io/gdcc/configbaker:8129-harvesting

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

landreev · 2024-09-10T12:18:30Z

@pdurbin
Just for clarity, could you please attach an actual example - 2 exported oai_dc fragments, before and after, to illustrate the final result of the changes made in the PR. (For the benefit of somebody reading the PR in the future; the discussion above is quite extensive and potentially confusing)

pdurbin · 2024-09-10T14:39:57Z

@landreev sure, I added two XML examples, before and after, to the description of this PR.

landreev

I have confirmed that the change does not affect harvesting. In normal practice, this should not even be a concern, as two Dataverses should never use the oai_dc as the format for harvesting from each other. But it's not entirely impossible that it will be a practical use case for somebody. Plus it would simply feel wrong, for Dataverse not to be able to import its own metadata exports. So, happy to report that 2 Dataverses can still harvest from each other using the format.

@pdurbin

* missing empty watermark entry * fix capitalization * Changed: dataverse image_url Solr property set on SearchServiceBean * Changed: do not modify existing JSF logic * remove unused imports IQSS#10517 * add test to assert capitalizataion of Dataset and Software IQSS#10517 * add details to error messages (IQSS#10813) * Fix addDataverse expected request body structure (IQSS#10802) * Fixed: MetadataBlockServiceBean to check for not excluded fields in input levels * Changed: using queries for obtaining dataset field types based on displaying conditions * Refactor: json printer method for MetadataBlock * Added: IT test case for list metadata blocks testing field with include=false and displayOnCreate=true property * Fixed: removed condition in MetadataBlockServiceBean * Added: release notes for IQSS#10741 * Fixed: displayOnCreate query logic * Fixed: excluding conditionally required fields when display-on-create is true * Fixed: query predicate for required-in-dataverse field condition * Fixed: addDataverse API facetIds field json structure * Added: docs IQSS#10800 * A one line fix for IQSS#10821 - ? (IQSS#10823) * Add thumbnail for featured dataverses (IQSS#10433) * Add thumbnail for futured dataverses * Add documentation * Release note snippet * New flyway namming * Update doc/release-notes/10433-release-notes.md Co-authored-by: Philip Durbin <philipdurbin@gmail.com> * Release note snippet update Add new recommandations (HTML preview + "for more information ...") * Update SQL file name after 6.2 release * renamed sql file --------- Co-authored-by: Philip Durbin <philipdurbin@gmail.com> Co-authored-by: Ludovic DANIEL <ludovic.daniel@smile.fr> Co-authored-by: Philip Durbin <philip_durbin@harvard.edu> * bump sql script version IQSS#10517 * JDD Metrics: Label KO IQSS#10123 (IQSS#10124) * remove parentheses * Correction of the parenthesis display * conditional INSERT of dataset type IQSS#10517 * Add logic to suppress query tool display for non-public files. * typo * fix test * fix labels when cvoc is used * doc tweaks for versioned base images: making releases IQSS#10827 * iterate on "supported image tags" section IQSS#10827 * Added: setting imageUrl in SearchServiceBean for datasets and files * simplify now that everything is inside the try * update tests - added one field in citation block * reworked controlled vocab language keys * fixing key to lowercase * fixing key to lowercase * release note * undo changes * Update doc/release-notes/10810-search-api-payload-extensions.md Co-authored-by: Philip Durbin <philip_durbin@harvard.edu> * Added: note about upcoming change to image_url field in docs * fixing language list * fixing language list * fixing language list * fixing language list * fixing language list * fixing language list * changes per review comments * changes per review comments * support no pubIdType for URLs * direct people to the log for failures - they aren't in the response * bug - the _target url isn't being set elsewhere * avoid failing when the entity is null for error statuses * don't update unpublished files - no need and it will fail the updateIdentifier call is checking for the findable metadata which is not available before publication. (We don't update DataCite after dataset edits, so unpublished datasets don't go through here, but unpublished files on published datasets would hit this code) * lower logging, add null check on relatedIdentifier * Change to use POST for all * Documentation and updated release note * changes per review comments * test fix - number of fields * Remap oai_dc fields dc:type and dc:date (IQSS#10737) * Remap oai_dc fields dc:type, dc:date, and dc:rights IQSS#8129. The `oai_dc` export and harvesting format has had the following fields remapped: - dc:type was mapped to the field "Kind of Data". Now it is hard-coded to the word "Dataset". - dc:date was mapped to the field "Production Date" when available and otherwise to "Publication Date". Now it is mapped only to the field "Publication Date". - dc:rights was not mapped to anything. Now it is mapped (when available) to terms of use, restrictions, and license. * add tests for export and citation date IQSS#8129 * map dc:date to pub date or field for citation date IQSS#8129 * back out of any changes to dc:rights IQSS#8129 * remove OAI-PMH changes from API changelog (also in release note) IQSS#8129 * tweak release note, mention backward incompatibility, reexport IQSS#8129 * update release note * check for ROR in grantAgency field too * 10527 404static.xhtml has an old date in the footer (2023 is hard-coded) and update URLs (IQSS#10535) * Javascript updates the year automatically and the URLs are customised. * add taps * only change year and and all the other errors are still there. * adopt using CDI, fix funderIdentifier element per schema * datasetTypes test fix * release note/changelog changes * chore(ct): update base image wait4x to 2.14.2 for stdlib update * chore(ct): update base image wait4x to 2.14.2 for stdlib update IQSS#10844 * don't send contributors w/o contributorType * relatedIdentifierType is required * flip to prefer identifier over url seeing cases at QDR where the type is DOI, the identifier is the doi and the URL is a non-DOI reference (e.g. at pubmed). * Handle case where type is set but there's no identifier * map non-standard contributors to Other, remove unused imports * Treat missing contrib type as Other * chore(ci): delete duplicate action after renaming * docs(ct): update base image tag policy from latest discussion IQSS#10827 As discussed during the 2024-09-12 containerization working group meeting (see ct.gdcc.io) and on https://dataverse.zulipchat.com/#narrow/stream/375812-containers/topic/change.20version.20scheme.20base.20image.3F/near/469884104 * style,docs(ct): minor tweaks to base image policy as per @pdurbin * feat(ct): switch latest to unstable in base image flow As per latest discussion, we want to keep the unstable tag around. It shall still point to the latest from develop as it has been done before our revised tagging policy. Latest will be used for production images, much more aligned with the Bitnami policy * style,docs(ct): minor tweaks to base image policy as per @pdurbin * fix(ct): remove auth for revision action Maybe we won't run into a rate limit for now, as the limit of 180/s * 100 = 18000 tags/s seems to leave us some headroom for now. * fix(ct): try to avoid shell substitution in base image flow We might need more backslashes to avoid telling the shell to replace what we want to be a maven property * fix(ct): avoid shell substitutions going awry for base image release tag Instead of using a Maven reference and dealing with escaping of $ chars, override the suffix with an empty string * add trailing / after hostname for perma base-url IQSS#10775 * ci(ct): remove out of scope actions for IQSS#10478 We will deal with shipping the updated application container images separately as part of issue IQSS#10618. Adding some comments about why some stuff is still around. * ci(ct): reorganise tags for develop branch IQSS#10478 Now adding the "upcoming" tag during the develop branch run. Also some reorganizing to ship the tag options using outputs, not env vars. Moving the common Maven option to enable tag overrides to the command instead of the options building. * chore(parent): remove OSS plugin snapshot repo after upgrade to DMP 0.45 * build(ct): make application image use new base image flexi stack IQSS#10478 * style(ct): remove stale comment from base module pom * chore(ct): add comment about apt sec updates detection via list hack For now, we stick to not alter the image more than necessary. Only packages we installed will be upgraded, as these are not part of the normal Java base images. The Java base images receive regular updates and undergo testing. It might be unwise to just install all the security updates we could get. Leaving the option here for later saves the trouble to dig up the solution again. * ci(ct): use new setup-maven action in base image push workflow Simplify setup following DRY principle * ci(ct): replace logic in base image push workflow Using the same actions and steps as done in the maintenance action should work for this workflow in case of a push event, too. * test(ct): temporarily set dev branch in base image push flow to feature branch * ci(ct): use setup-maven action in app image push workflow to simplify setup Following DRY principle, reuse the steps defined * test(ct): temporarily enable app image push flow execution in all forks * avoid spurious log warning for others e.g. isbn these were going through the default check for URLs and failing (not a url) leading to a warning. The new code should try URL parsing for URLs, try PID and URL parsing for ones with no type specified, and send the rest of the identifiers w/o any additional (optional) attributes. * update doc * ci(ct): use an optional base image ref for app image push flow We need to transfer the determined base image name we might have just built from the calling workflow into this flow. As we provide a default value, this is picked up for pull_requests. * ci(setup-maven): try to auto-detect git ref It's not so easy to determine the right git ref for different scenarios like PR, etc. Unless explicitly given a ref, try to autodetect the right one to go with. * chore(ct): add notes in flows about adding a path filter We want to avoid duplicate runs which might trigger race conditions for image shipments. * chore(ct): add note about missing triggers for base push flow This is out of scope for IQSS#10827, but should be addressed at a later point to avoid duplicated runs with potential race conditions. Also it enables proper rebuilds for preview images when someone is just trying to create a base image change, but which should obviously be tested full chain. * fix,ci(ct): don't trigger the base push flow for backports Obviously these are meant for the maintenance workflow, not the push flow! * ci(ct): trigger app flow from base push flow We detect the tag we have been using in a finalizing step to hand a proper base image ref to the app image workflow to make it work on the images we just pushed to the registry. * changes per review * Apply suggestions from code review Co-authored-by: Philip Durbin <philipdurbin@gmail.com> * cleaner formatting * minor doc tweak IQSS#10632 * standardize image url * No longer needed with use of CDI.current() in XMLMetadataTemplate * no longer used and CrossRef ended up using it's own. * add more info about the scope of changes. * doc changes * doc changes * ci(ct): reshape maintenance workflow into external matrix script Unfortunately, matrix jobs logs and outputs cannot be aggregated in Github Actions. The only way to work around the limitations of GHA is by using a custom build script that create a similar matrix like experience. This commit introduces these scripts, probably also making some custom actions we added obsolete. * ci(ct): remove obsolete actions for revisions and parent image changes detection * ci(ct): re-enable forced build for maintenance workflow * doc changes * doc changes * ci(ct): add outputs to maintenance matrix job Can be picked up by other jobs, e.g. to create textblocks for docs or a job matrix. * ci(ct): remove draft of building app images in maintenance matrix job * style,ci(ct): reword the maintenance build workflow name * ci(ct): make the maintenance workflow push the hub description for the base image IQSS#10478 * style(ct): fix simple typo in base image README * fix(ct): remove bug from package upgrade detection in maintenance workflow We did not correctly compare the status code of the grep command, breaking the update detection * update query per review comments to handle all cases * docs,style(ct): small rewording about immutable tags for base image * feat,ci(ct): add immutable tags to list of base image tags in maintenance job As discussed during community meeting on 2024-09-19. * style,ci(ct): add some more verbosity about progress in maintenance job * refactor,ci(ct): finishing touches for IQSS#10478 Re-enable and change everything necessary to reference the upstream IQSS context as of now. * add docs for disable-dataset-thumbnail-autoselect IQSS#10819 IQSS#10820 * create 6.4 release notes and add about half the updates IQSS#10853 * add second half of snippets IQSS#10853 * fix,ci(ct): only add base image Maven option when the input is defined Without this in case of the pull_request event the input is null and the build fails because we have base image defined at all. Simply not adding the option if the input is undefined means we stick to what is defined within the POM. * add highlights and upgrade steps IQSS#10853 * docs(ct): add release note for maintenance workflow IQSS#10478 * style(ct): add comment explaining what a flavor is in base image Maven props Co-authored-by: Philip Durbin <philipdurbin@gmail.com> * A potential simple fix for IQSS#10667 ? * Update 6.4-release-notes.md corrected the schema.xml instructions * add blurb for tagged base images IQSS#10853 * doc how to handle develop into develop PRs IQSS#9508 * remove extra line * dont return image_url if there is none * Fix NPE using CVOC * set dataset type before registering pid (which needs the type) * add release note * adding fix from review comment * remove tabs to make reviewdog happy. woof! IQSS#10623 * globus doc tweaks IQSS#10623 * add more and better DataCite export IQSS#10853 * add blur for listing feature flags IQSS#10853 * add CVOC bug fix IQSS#10853 -6.4-release-notes * update image_url IQSS#10853 * Updated the docs to reflect the new name of a JVM option (IQSS#10623) * I fixed anchor links IQSS#10876 (IQSS#10877) * improve release note IQSS#10623 * add globus async IQSS#10853 * add cvoc update IQSS#10853 * add new globus settings under settings section IQSS#10853 * various tweaks IQSS#10853 * typo IQSS#10853 * put features before bug fixes IQSS#10853 * reword * datacite title * croissant update IQSS#10853 * bump version to 6.4 IQSS#10852 (IQSS#10871) * displayOnCreate set to true for depositor and dateOfDeposit in Citation metadata block (IQSS#10884) * Changed: displayOnCreate set to true for depositor and dateOfDeposit in citation.tsv * Changed: MetadataBlocksIT test assertion for new total number of displayOnCreate fields * Added: release notes for IQSS#10850 * Added: minor tweak to release notes * IQSS#10853 fix typo version number * Add release note change for fields depositor and dateOfDeposit in the citation.tsv * remove old release note * formatting fix fixed formatting of the shell block in the upgrade instruction * tweak depositor and dateOfDeposit IQSS#10853 * fixed update-fields.sh url (it had "9.4.1" in it; and we probably don't want to get it from the master branch either) * reindex instruction * removed a superfluous command line * temp dir cleanup * typo * docs: update release notes from IQSS#10343 * tweaks IQSS#10343 * Upgrade to upstream version 6.4 * Merge upstream v6.4 into branch properties * Sync with upstream * Fix merge of properties * bugfix: metadataFragment.xhtml * fix relationType display value bug --------- Co-authored-by: Jim Myers <qqmyers@hotmail.com> Co-authored-by: GPortas <hey@gportas.me> Co-authored-by: Philip Durbin <philip_durbin@harvard.edu> Co-authored-by: landreev <leonid@hmdc.harvard.edu> Co-authored-by: jeromeroucou <jeromeroucou@users.noreply.github.com> Co-authored-by: Philip Durbin <philipdurbin@gmail.com> Co-authored-by: Ludovic DANIEL <ludovic.daniel@smile.fr> Co-authored-by: sbondka <145585953+sbondka@users.noreply.github.com> Co-authored-by: Stephen Kraffmiller <skraffmiller@hmdc.harvard.edu> Co-authored-by: Steven Winship <39765413+stevenwinship@users.noreply.github.com> Co-authored-by: Benedikt Kruse <149382667+BenediktMeierUIT@users.noreply.github.com> Co-authored-by: Oliver Bertuch <o.bertuch@fz-juelich.de> Co-authored-by: Oliver Bertuch <poikilotherm@users.noreply.github.com> Co-authored-by: qqmyers <jim.myers@computer.org> Co-authored-by: paulboon <paul.boon@dans.knaw.nl> Co-authored-by: ofahimIQSS <mfahim11427@gmail.com> Co-authored-by: Florian Fritze <florian.fritze@ub.uni-stuttgart.de>

pdurbin added Feature: Harvesting Size: 10 A percentage of a sprint. 7 hours. GREI 3 Search and Browse FY25 Sprint 3 FY25 Sprint 3 labels Aug 1, 2024

pdurbin requested a review from landreev August 1, 2024 18:02

pdurbin requested a review from tcoupin August 1, 2024 18:07

pdurbin mentioned this pull request Aug 1, 2024

As a data repository, I need to harvest additional metadata in OAI_DC records #4176

Open

pdurbin mentioned this pull request Aug 1, 2024

Change Dataverse / Dublin Core mapping to improve OAI-PMH harvesting #8129

Closed

This comment has been minimized.

Sign in to view

pdurbin self-assigned this Aug 2, 2024

add tests for export and citation date #8129

c59f743

This comment has been minimized.

Sign in to view

pdurbin removed their assignment Aug 2, 2024

pdurbin added 2 commits August 5, 2024 11:51

map dc:date to pub date or field for citation date #8129

a9f7d79

Merge branch 'develop' into 8129-harvesting #8129

a4f98de

cmbz added FY25 Sprint 4 FY25 Sprint 4 FY25 Sprint 5 FY25 sprint 5 labels Aug 28, 2024

cmbz mentioned this pull request Aug 28, 2024

GREI 3: HDV Task - Improve OAI-PMH Harvesting IQSS/dataverse-pm#171

Open

60 tasks

Merge branch 'develop' into 8129-harvesting #8129

2f2a7f7

This comment has been minimized.

Sign in to view

jggautier mentioned this pull request Sep 9, 2024

Access Rights metadata in OpenAIRE metadata export is being misapplied #5920

Open

pdurbin self-assigned this Sep 9, 2024

pdurbin added 4 commits September 9, 2024 16:28

back out of any changes to dc:rights #8129

1968b87

remove OAI-PMH changes from API changelog (also in release note) #8129

c4e3097

tweak release note, mention backward incompatibility, reexport #8129

2805eb5

Merge branch 'develop' into 8129-harvesting #8129

01e266c

pdurbin changed the title ~~Remap oai_dc fields dc:type, dc:date, and dc:rights~~ Remap oai_dc fields dc:type and dc:date Sep 9, 2024

pdurbin removed their assignment Sep 9, 2024

landreev approved these changes Sep 10, 2024

View reviewed changes

landreev removed their assignment Sep 10, 2024

stevenwinship self-assigned this Sep 11, 2024

stevenwinship merged commit 4b96cec into develop Sep 11, 2024
23 checks passed

stevenwinship removed their assignment Sep 11, 2024

pdurbin added this to the 6.4 milestone Sep 11, 2024

stevenwinship deleted the 8129-harvesting branch September 17, 2024 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remap oai_dc fields dc:type and dc:date #10737

Remap oai_dc fields dc:type and dc:date #10737

pdurbin commented Aug 1, 2024 •

edited

Loading

pdurbin commented Aug 1, 2024

pdurbin commented Aug 1, 2024

qqmyers commented Aug 1, 2024

coveralls commented Aug 1, 2024 •

edited

Loading

This comment has been minimized.

pdurbin commented Aug 1, 2024

qqmyers commented Aug 1, 2024

pdurbin commented Aug 1, 2024

philippconzett commented Aug 2, 2024

plecor commented Aug 2, 2024

This comment has been minimized.

pdurbin commented Aug 2, 2024

pdurbin commented Aug 2, 2024

qqmyers commented Aug 2, 2024

pdurbin commented Aug 5, 2024

qqmyers commented Aug 5, 2024

philippconzett commented Aug 28, 2024

jggautier commented Aug 28, 2024

philippconzett commented Aug 30, 2024

This comment has been minimized.

pdurbin commented Sep 4, 2024

jggautier commented Sep 6, 2024

pdurbin commented Sep 9, 2024

github-actions bot commented Sep 9, 2024

landreev commented Sep 10, 2024 •

edited

Loading

pdurbin commented Sep 10, 2024

landreev left a comment

Remap oai_dc fields dc:type and dc:date #10737

Remap oai_dc fields dc:type and dc:date #10737

Conversation

pdurbin commented Aug 1, 2024 • edited Loading

Before

After

pdurbin commented Aug 1, 2024

pdurbin commented Aug 1, 2024

qqmyers commented Aug 1, 2024

coveralls commented Aug 1, 2024 • edited Loading

This comment has been minimized.

pdurbin commented Aug 1, 2024

qqmyers commented Aug 1, 2024

pdurbin commented Aug 1, 2024

philippconzett commented Aug 2, 2024

plecor commented Aug 2, 2024

This comment has been minimized.

pdurbin commented Aug 2, 2024

pdurbin commented Aug 2, 2024

qqmyers commented Aug 2, 2024

pdurbin commented Aug 5, 2024

qqmyers commented Aug 5, 2024

philippconzett commented Aug 28, 2024

jggautier commented Aug 28, 2024

philippconzett commented Aug 30, 2024

This comment has been minimized.

pdurbin commented Sep 4, 2024

jggautier commented Sep 6, 2024

pdurbin commented Sep 9, 2024

github-actions bot commented Sep 9, 2024

landreev commented Sep 10, 2024 • edited Loading

pdurbin commented Sep 10, 2024

landreev left a comment

Choose a reason for hiding this comment

pdurbin commented Aug 1, 2024 •

edited

Loading

coveralls commented Aug 1, 2024 •

edited

Loading

landreev commented Sep 10, 2024 •

edited

Loading