Merge branch 'develop' into 8372-gdcc-xoai-library

IQSS · Sep 28, 2022 · e448b68 · e448b68
2 parents 5ab41b0 + 7c1683b
commit e448b68
Show file tree

Hide file tree

Showing 10 changed files with 205 additions and 54 deletions.
diff --git a/doc/sphinx-guides/source/admin/metadataexport.rst b/doc/sphinx-guides/source/admin/metadataexport.rst
@@ -11,19 +11,35 @@ Publishing a dataset automatically starts a metadata export job, that will run i
 
 A scheduled timer job that runs nightly will attempt to export any published datasets that for whatever reason haven't been exported yet. This timer is activated automatically on the deployment, or restart, of the application. So, again, no need to start or configure it manually. (See the :doc:`timers` section of this Admin Guide for more information.)
 
-Batch exports through the API 
+.. _batch-exports-through-the-api:
+
+Batch Exports Through the API
 -----------------------------
 
-In addition to the automated exports, a Dataverse installation admin can start a batch job through the API. The following 2 API calls are provided: 
+In addition to the automated exports, a Dataverse installation admin can start a batch job through the API. The following four API calls are provided: 
 
 ``curl http://localhost:8080/api/admin/metadata/exportAll``
 
 ``curl http://localhost:8080/api/admin/metadata/reExportAll``
 
-The former will attempt to export all the published, local (non-harvested) datasets that haven't been exported yet. 
-The latter will *force* a re-export of every published, local dataset, regardless of whether it has already been exported or not. 
+``curl http://localhost:8080/api/admin/metadata/clearExportTimestamps``
+
+``curl http://localhost:8080/api/admin/metadata/:persistentId/reExportDataset?persistentId=doi:10.5072/FK2/AAA000``
+
+The first will attempt to export all the published, local (non-harvested) datasets that haven't been exported yet. 
+The second will *force* a re-export of every published, local dataset, regardless of whether it has already been exported or not. 
+
+The first two calls return a status message informing the administrator that the process has been launched (``{"status":"WORKFLOW_IN_PROGRESS"}``). The administrator can check the progress of the process via log files: ``[Payara directory]/glassfish/domains/domain1/logs/export_[time stamp].log``.
+
+Instead of running "reExportAll" the same can be accomplished using "clearExportTimestamps" followed by "exportAll".
+The difference is that when exporting prematurely fails due to some problem, the datasets that did not get exported yet still have the timestamps cleared. A next call to exportAll will skip the datasets already exported and try to export the ones that still need it. 
+Calling clearExportTimestamps should return ``{"status":"OK","data":{"message":"cleared: X"}}`` where "X" is the total number of datasets cleared.
+
+The reExportDataset call gives you the opportunity to *force* a re-export of only a specific dataset and (with some script automation) could allow you the export specific batches of datasets. This might be usefull when handling exporting problems or when reExportAll takes too much time and is overkill. Note that :ref:`export-dataset-metadata-api` is a related API.
+
+reExportDataset can be called with either ``persistentId`` (as shown above, with a DOI) or with the database id of a dataset (as shown below, with "42" as the database id).
 
-These calls return a status message informing the administrator, that the process has been launched (``{"status":"WORKFLOW_IN_PROGRESS"}``). The administrator can check the progress of the process via log files: ``[Payara directory]/glassfish/domains/domain1/logs/export_[time stamp].log``.
+``curl http://localhost:8080/api/admin/metadata/42/reExportDataset``
 
 Note, that creating, modifying, or re-exporting an OAI set will also attempt to export all the unexported datasets found in the set.
 

diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst
@@ -840,7 +840,9 @@ The fully expanded example above (without environment variables) looks like this
 Export Metadata of a Dataset in Various Formats
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-|CORS| Export the metadata of the current published version of a dataset in various formats see Note below:
+|CORS| Export the metadata of the current published version of a dataset in various formats.
+
+See also :ref:`batch-exports-through-the-api` and the note below:
 
 .. code-block:: bash
 

diff --git a/doc/sphinx-guides/source/developers/making-releases.rst b/doc/sphinx-guides/source/developers/making-releases.rst
@@ -5,61 +5,72 @@ Making Releases
 .. contents:: |toctitle|
 	:local:
 
-Use the number of the milestone with a "v" in front for the release tag. For example: ``v4.6.2``.
+Introduction
+------------
 
-Create the release GitHub issue and branch 
-------------------------------------------
+See :doc:`version-control` for background on our branching strategy.
+
+The steps below describe making both normal releases and hotfix releases.
+
+Write Release Notes
+-------------------
+
+Developers express the need for an addition to release notes by creating a file in ``/doc/release-notes`` containing the name of the issue they're working on. The name of the branch could be used for the filename with ".md" appended (release notes are written in Markdown) such as ``5053-apis-custom-homepage.md``. 
+
+The task at or near release time is to collect these notes into a single doc.
+
+- Create an issue in GitHub to track the work of creating release notes for the upcoming release.
+- Create a branch, add a .md file for the release (ex. 5.10.1 Release Notes) in ``/doc/release-notes`` and write the release notes, making sure to pull content from the issue-specific release notes mentioned above.
+- Delete the previously-created, issue-specific release notes as the content is added to the main release notes file.
+- Take the release notes .md through the regular Code Review and QA process.
 
-Use the GitHub issue number and the release tag for the name of the branch. 
-For example: 4734-update-v-4.8.6-to-4.9
+Create a GitHub Issue and Branch for the Release
+------------------------------------------------
+
+Usually we branch from the "develop" branch to create the release branch. If we are creating a hotfix for a particular version (5.11, for example), we branch from the tag (e.g. ``v5.11``).
+
+Use the GitHub issue number and the release tag for the name of the branch. (e.g. ``8583-update-version-to-v5.10.1``
 
 **Note:** the changes below must be the very last commits merged into the develop branch before it is merged into master and tagged for the release!
 
-Make the following changes in the release branch:
+Make the following changes in the release branch.
 
-1. Bump Version Numbers
-=======================
+Bump Version Numbers
+--------------------
 
-Increment the version number to the milestone (e.g. 4.6.2) in the following two files:
+Increment the version number to the milestone (e.g. 5.10.1) in the following two files:
 
-- modules/dataverse-parent/pom.xml -> ``<properties>`` -> ``<revision>``
-- doc/sphinx-guides/source/conf.py (two places)
+- modules/dataverse-parent/pom.xml -> ``<properties>`` -> ``<revision>`` (e.g. `pom.xml commit <https://github.com/IQSS/dataverse/commit/3943aa0>`_)
+- doc/sphinx-guides/source/conf.py (two places, e.g. `conf.py commit <https://github.com/IQSS/dataverse/commit/18fd296>`_)  
 
 Add the version being released to the lists in the following two files:
 
-- doc/sphinx-guides/source/versions.rst 
-- scripts/database/releases.txt
+- doc/sphinx-guides/source/versions.rst (e.g. `versions.rst commit <https://github.com/IQSS/dataverse/commit/0511245>`_)
 
-Here's an example commit where three of the four files above were updated at once: https://github.com/IQSS/dataverse/commit/99e23f96ec362ac2f524cb5cd80ca375fa13f196
-(Note: the version has been moved to a property in parent module since this commit was created.)
+Check in the Changes Above into a Release Branch and Merge It
+-------------------------------------------------------------
 
-2. Check in the Changes Above...
-================================
+For any ordinary release, make the changes above in the release branch you created, make a pull request, and merge it into the "develop" branch. Like usual, you can safely delete the branch after the merge is complete.
 
-... into the release branch, make a pull request and merge the release branch into develop. 
+If you are making a hotfix release, make the pull request against the "master" branch. Do not delete the branch after merging because we will later merge it into the "develop" branch to pick up the hotfix. More on this later.
 
+Either way, as usual, you should ensure that all tests are passing. Please note that you might need to bump the version in `jenkins.yml <https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/blob/develop/tests/group_vars/jenkins.yml>`_ in dataverse-ansible to get the tests to run.
 
 Merge "develop" into "master"
 -----------------------------
 
-The "develop" branch should be merged into "master" before tagging. See also the branching strategy described in the :doc:`version-control` section.
+Note: If you are making a hotfix release, the "develop" branch is not involved so you can skip this step.
 
-Write Release Notes
--------------------
+The "develop" branch should be merged into "master" before tagging.
 
-Developers should express the need for an addition to release notes by creating a file in ``/doc/release-notes`` containing the name of the issue they're working on. The name of the branch could be used for the filename with ".md" appended (release notes are written in Markdown) such as ``5053-apis-custom-homepage.md``. 
+Create a Draft Release on GitHub
+--------------------------------
 
-At or near release time:
+Create a draft release at https://github.com/IQSS/dataverse/releases/new
 
-- Create an issue in Github to track the work of creating release notes for the upcoming release
-- Create a branch, add a .md file for the release (ex. 4.16 Release Notes) in ``/doc/release-notes`` and write the release notes, making sure to pull content from the issue-specific release notes mentioned above
-- Delete the previously-created, issue-specific release notes as the content is added to the main release notes file
-- Take the release notes .md through the regular Code Review and QA process
-- Create a draft release at https://github.com/IQSS/dataverse/releases/new
-- The "tag version" and "title" should be the number of the milestone with a "v" in front (i.e. v4.16).
-- Copy in the content from the .md file
-- For the description, follow post-4.16 examples at https://github.com/IQSS/dataverse/releases
+The "tag version" and "title" should be the number of the milestone with a "v" in front (i.e. v5.10.1).
 
+Copy in the content from the .md file created in the "Write Release Notes" steps above.
 
 Make Artifacts Available for Download
 -------------------------------------
@@ -70,11 +81,46 @@ Upload the following artifacts to the draft release you created:
 - installer (``cd scripts/installer && make``)
 - other files as needed, such as updated Solr schema and config files
 
-Publish Release
----------------
+Publish the Release
+-------------------
 
 Click the "Publish release" button.
 
+Close Milestone on GitHub and Create a New One
+----------------------------------------------
+
+You can find our milestones at https://github.com/IQSS/dataverse/milestones
+
+Now that we've published the release, close the milestone and create a new one.
+
+Note that for milestones we use just the number without the "v" (e.g. "5.10.1").
+
+Add the Release to the Dataverse Roadmap
+----------------------------------------
+
+Add an entry to the list of releases at https://www.iq.harvard.edu/roadmap-dataverse-project 
+
+Announce the Release on the Dataverse Blog
+------------------------------------------
+
+Make a blog post at https://dataverse.org/blog
+
+Announce the Release on the Mailing List
+----------------------------------------
+
+Post a message at https://groups.google.com/g/dataverse-community
+
+For Hotfixes, Merge Hotfix Branch into "develop" and Rename SQL Scripts
+-----------------------------------------------------------------------
+
+Note: this only applies to hotfixes!
+
+We've merged the hotfix into the "master" branch but now we need the fixes (and version bump) in the "develop" branch. Make a new branch off the hotfix branch and create a pull request against develop. Merge conflicts are possible and this pull request should go through review and QA like normal. Afterwards it's fine to delete this branch and the hotfix brach that was merged into master.
+
+Because of the hotfix version, any SQL scripts in "develop" should be renamed (from "5.11.0" to "5.11.1" for example). To read more about our naming conventions for SQL scripts, see :doc:`sql-upgrade-scripts`.
+
+Please note that version bumps and SQL script renaming both require all open pull requests to be updated with the latest from the "develop" branch so you might want to add any SQL script renaming to the hotfix branch before you put it through QA to be merged with develop. This way, open pull requests only need to be updated once.
+
 ----
 
 Previous: :doc:`containers` | Next: :doc:`tools`
diff --git a/doc/sphinx-guides/source/developers/version-control.rst b/doc/sphinx-guides/source/developers/version-control.rst
@@ -46,6 +46,11 @@ Feature branches are used for both developing features and fixing bugs. They are
 
 "3728-doc-apipolicy-fix" is an example of a fine name for your feature branch. It tells us that you are addressing https://github.com/IQSS/dataverse/issues/3728 and the "slug" is short, descriptive, and starts with the issue number.
 
+Hotfix Branches
+***************
+
+Hotfix branches are described under :doc:`making-releases`.
+
 .. _how-to-make-a-pull-request:
 
 How to Make a Pull Request

diff --git a/modules/dataverse-parent/pom.xml b/modules/dataverse-parent/pom.xml
@@ -147,10 +147,10 @@
 
         <!-- Major system components and dependencies -->
         <payara.version>5.2022.3</payara.version>
-        <postgresql.version>42.3.5</postgresql.version>
+        <postgresql.version>42.5.0</postgresql.version>
         <solr.version>8.11.1</solr.version>
-        <aws.version>1.11.762</aws.version>
-        <google.cloud.version>0.157.0</google.cloud.version>
+        <aws.version>1.12.290</aws.version>
+        <google.cloud.version>0.177.0</google.cloud.version>
 
         <!-- Basic libs, logging -->
         <jakartaee-api.version>8.0.0</jakartaee-api.version>

diff --git a/pom.xml b/pom.xml
@@ -117,7 +117,7 @@
         <dependency>
             <groupId>com.google.code.gson</groupId>
             <artifactId>gson</artifactId>
-            <version>2.2.4</version>
+            <version>2.8.9</version>
             <scope>compile</scope>
         </dependency>
         <!-- Should be refactored and moved to transitive section above once on Java EE 8 (makes WAR smaller) -->
@@ -347,7 +347,7 @@
         <dependency>
             <groupId>org.jsoup</groupId>
             <artifactId>jsoup</artifactId>
-            <version>1.14.2</version>
+            <version>1.15.3</version>
         </dependency>
         <dependency>
             <groupId>io.searchbox</groupId>
@@ -380,7 +380,7 @@
         <dependency>
             <groupId>com.nimbusds</groupId>
             <artifactId>oauth2-oidc-sdk</artifactId>
-            <version>9.9.1</version>
+            <version>9.41.1</version>
         </dependency>
         <!-- New and Improved GDCC XOAI library! --> 
         <dependency>

diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java
@@ -802,6 +802,35 @@ public void exportAllDatasets(boolean forceReExport) {
 
     }
 
+
+    @Asynchronous
+    public void reExportDatasetAsync(Dataset dataset) {
+        exportDataset(dataset, true);
+    }
+
+    public void exportDataset(Dataset dataset, boolean forceReExport) {
+        if (dataset != null) {
+            // Note that the logic for handling a dataset is similar to what is implemented in exportAllDatasets, 
+            // but when only one dataset is exported we do not log in a separate export logging file
+            if (dataset.isReleased() && dataset.getReleasedVersion() != null && !dataset.isDeaccessioned()) {
+
+                // can't trust dataset.getPublicationDate(), no. 
+                Date publicationDate = dataset.getReleasedVersion().getReleaseTime(); // we know this dataset has a non-null released version! Maybe not - SEK 8/19 (We do now! :)
+                if (forceReExport || (publicationDate != null
+                        && (dataset.getLastExportTime() == null
+                        || dataset.getLastExportTime().before(publicationDate)))) {
+                    try {
+                        recordService.exportAllFormatsInNewTransaction(dataset);
+                        logger.info("Success exporting dataset: " + dataset.getDisplayName() + " " + dataset.getGlobalIdString());
+                    } catch (Exception ex) {
+                        logger.info("Error exporting dataset: " + dataset.getDisplayName() + " " + dataset.getGlobalIdString() + "; " + ex.getMessage());
+                    }
+                }
+            }
+        }
+
+    }
+
     public String getReminderString(Dataset dataset, boolean canPublishDataset) {
         return getReminderString( dataset, canPublishDataset, false);
     }
@@ -842,6 +871,13 @@ public String getReminderString(Dataset dataset, boolean canPublishDataset, bool
         }
     }
 
+    @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
+    public int clearAllExportTimes() {
+        Query clearExportTimes = em.createQuery("UPDATE Dataset SET lastExportTime = NULL");
+        int numRowsUpdated = clearExportTimes.executeUpdate();
+        return numRowsUpdated;
+    }
+
     public Dataset setNonDatasetFileAsThumbnail(Dataset dataset, InputStream inputStream) {
         if (dataset == null) {
             logger.fine("In setNonDatasetFileAsThumbnail but dataset is null! Returning null.");

diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Metadata.java b/src/main/java/edu/harvard/iq/dataverse/api/Metadata.java
@@ -5,19 +5,25 @@
  */
 package edu.harvard.iq.dataverse.api;
 
+import edu.harvard.iq.dataverse.Dataset;
 import edu.harvard.iq.dataverse.DatasetServiceBean;
+
+import java.io.IOException;
+import java.util.concurrent.Future;
 import java.util.logging.Logger;
 import javax.ejb.EJB;
-import javax.ws.rs.GET;
-import javax.ws.rs.Path;
-import javax.ws.rs.Produces;
+import javax.json.Json;
+import javax.json.JsonArrayBuilder;
+import javax.json.JsonObjectBuilder;
+import javax.ws.rs.*;
 import javax.ws.rs.core.Response;
 
 import javax.ws.rs.core.Response;
-import javax.ws.rs.PathParam;
-import javax.ws.rs.PUT;
+
+import edu.harvard.iq.dataverse.DatasetVersion;
 import edu.harvard.iq.dataverse.harvest.server.OAISetServiceBean;
 import edu.harvard.iq.dataverse.harvest.server.OAISet;
+import org.apache.solr.client.solrj.SolrServerException;
 
 /**
  *
@@ -59,7 +65,27 @@ public Response exportAll() {
     public Response reExportAll() {
         datasetService.reExportAllAsync();
         return this.accepted();
-    } 
+    }
+
+    @GET
+    @Path("{id}/reExportDataset")
+    public Response indexDatasetByPersistentId(@PathParam("id") String id) {
+        try {
+            Dataset dataset = findDatasetOrDie(id);
+            datasetService.reExportDatasetAsync(dataset);
+            return ok("export started");
+        } catch (WrappedResponse wr) {
+            return wr.getResponse();
+        }
+    }
+
+    @GET
+    @Path("clearExportTimestamps")
+    public Response clearExportTimestamps() {
+        // only clear the timestamp in the database, cached metadata export files are not deleted
+        int numItemsCleared = datasetService.clearAllExportTimes();
+        return ok("cleared: " + numItemsCleared);
+    }
 
     /**
      * initial attempt at triggering indexing/creation/population of a OAI set without going throught