Skip to content

Commit

Permalink
Merge branch 'develop' into 8372-gdcc-xoai-library
Browse files Browse the repository at this point in the history
  • Loading branch information
landreev committed Sep 28, 2022
2 parents 5ab41b0 + 7c1683b commit e448b68
Show file tree
Hide file tree
Showing 10 changed files with 205 additions and 54 deletions.
26 changes: 21 additions & 5 deletions doc/sphinx-guides/source/admin/metadataexport.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,35 @@ Publishing a dataset automatically starts a metadata export job, that will run i

A scheduled timer job that runs nightly will attempt to export any published datasets that for whatever reason haven't been exported yet. This timer is activated automatically on the deployment, or restart, of the application. So, again, no need to start or configure it manually. (See the :doc:`timers` section of this Admin Guide for more information.)

Batch exports through the API
.. _batch-exports-through-the-api:

Batch Exports Through the API
-----------------------------

In addition to the automated exports, a Dataverse installation admin can start a batch job through the API. The following 2 API calls are provided:
In addition to the automated exports, a Dataverse installation admin can start a batch job through the API. The following four API calls are provided:

``curl http://localhost:8080/api/admin/metadata/exportAll``

``curl http://localhost:8080/api/admin/metadata/reExportAll``

The former will attempt to export all the published, local (non-harvested) datasets that haven't been exported yet.
The latter will *force* a re-export of every published, local dataset, regardless of whether it has already been exported or not.
``curl http://localhost:8080/api/admin/metadata/clearExportTimestamps``

``curl http://localhost:8080/api/admin/metadata/:persistentId/reExportDataset?persistentId=doi:10.5072/FK2/AAA000``

The first will attempt to export all the published, local (non-harvested) datasets that haven't been exported yet.
The second will *force* a re-export of every published, local dataset, regardless of whether it has already been exported or not.

The first two calls return a status message informing the administrator that the process has been launched (``{"status":"WORKFLOW_IN_PROGRESS"}``). The administrator can check the progress of the process via log files: ``[Payara directory]/glassfish/domains/domain1/logs/export_[time stamp].log``.

Instead of running "reExportAll" the same can be accomplished using "clearExportTimestamps" followed by "exportAll".
The difference is that when exporting prematurely fails due to some problem, the datasets that did not get exported yet still have the timestamps cleared. A next call to exportAll will skip the datasets already exported and try to export the ones that still need it.
Calling clearExportTimestamps should return ``{"status":"OK","data":{"message":"cleared: X"}}`` where "X" is the total number of datasets cleared.

The reExportDataset call gives you the opportunity to *force* a re-export of only a specific dataset and (with some script automation) could allow you the export specific batches of datasets. This might be usefull when handling exporting problems or when reExportAll takes too much time and is overkill. Note that :ref:`export-dataset-metadata-api` is a related API.

reExportDataset can be called with either ``persistentId`` (as shown above, with a DOI) or with the database id of a dataset (as shown below, with "42" as the database id).

These calls return a status message informing the administrator, that the process has been launched (``{"status":"WORKFLOW_IN_PROGRESS"}``). The administrator can check the progress of the process via log files: ``[Payara directory]/glassfish/domains/domain1/logs/export_[time stamp].log``.
``curl http://localhost:8080/api/admin/metadata/42/reExportDataset``

Note, that creating, modifying, or re-exporting an OAI set will also attempt to export all the unexported datasets found in the set.

Expand Down
4 changes: 3 additions & 1 deletion doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -840,7 +840,9 @@ The fully expanded example above (without environment variables) looks like this
Export Metadata of a Dataset in Various Formats
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

|CORS| Export the metadata of the current published version of a dataset in various formats see Note below:
|CORS| Export the metadata of the current published version of a dataset in various formats.

See also :ref:`batch-exports-through-the-api` and the note below:

.. code-block:: bash
Expand Down
112 changes: 79 additions & 33 deletions doc/sphinx-guides/source/developers/making-releases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,61 +5,72 @@ Making Releases
.. contents:: |toctitle|
:local:

Use the number of the milestone with a "v" in front for the release tag. For example: ``v4.6.2``.
Introduction
------------

Create the release GitHub issue and branch
------------------------------------------
See :doc:`version-control` for background on our branching strategy.

The steps below describe making both normal releases and hotfix releases.

Write Release Notes
-------------------

Developers express the need for an addition to release notes by creating a file in ``/doc/release-notes`` containing the name of the issue they're working on. The name of the branch could be used for the filename with ".md" appended (release notes are written in Markdown) such as ``5053-apis-custom-homepage.md``.

The task at or near release time is to collect these notes into a single doc.

- Create an issue in GitHub to track the work of creating release notes for the upcoming release.
- Create a branch, add a .md file for the release (ex. 5.10.1 Release Notes) in ``/doc/release-notes`` and write the release notes, making sure to pull content from the issue-specific release notes mentioned above.
- Delete the previously-created, issue-specific release notes as the content is added to the main release notes file.
- Take the release notes .md through the regular Code Review and QA process.

Use the GitHub issue number and the release tag for the name of the branch.
For example: 4734-update-v-4.8.6-to-4.9
Create a GitHub Issue and Branch for the Release
------------------------------------------------

Usually we branch from the "develop" branch to create the release branch. If we are creating a hotfix for a particular version (5.11, for example), we branch from the tag (e.g. ``v5.11``).

Use the GitHub issue number and the release tag for the name of the branch. (e.g. ``8583-update-version-to-v5.10.1``

**Note:** the changes below must be the very last commits merged into the develop branch before it is merged into master and tagged for the release!

Make the following changes in the release branch:
Make the following changes in the release branch.

1. Bump Version Numbers
=======================
Bump Version Numbers
--------------------

Increment the version number to the milestone (e.g. 4.6.2) in the following two files:
Increment the version number to the milestone (e.g. 5.10.1) in the following two files:

- modules/dataverse-parent/pom.xml -> ``<properties>`` -> ``<revision>``
- doc/sphinx-guides/source/conf.py (two places)
- modules/dataverse-parent/pom.xml -> ``<properties>`` -> ``<revision>`` (e.g. `pom.xml commit <https://github.com/IQSS/dataverse/commit/3943aa0>`_)
- doc/sphinx-guides/source/conf.py (two places, e.g. `conf.py commit <https://github.com/IQSS/dataverse/commit/18fd296>`_)

Add the version being released to the lists in the following two files:

- doc/sphinx-guides/source/versions.rst
- scripts/database/releases.txt
- doc/sphinx-guides/source/versions.rst (e.g. `versions.rst commit <https://github.com/IQSS/dataverse/commit/0511245>`_)

Here's an example commit where three of the four files above were updated at once: https://github.com/IQSS/dataverse/commit/99e23f96ec362ac2f524cb5cd80ca375fa13f196
(Note: the version has been moved to a property in parent module since this commit was created.)
Check in the Changes Above into a Release Branch and Merge It
-------------------------------------------------------------

2. Check in the Changes Above...
================================
For any ordinary release, make the changes above in the release branch you created, make a pull request, and merge it into the "develop" branch. Like usual, you can safely delete the branch after the merge is complete.

... into the release branch, make a pull request and merge the release branch into develop.
If you are making a hotfix release, make the pull request against the "master" branch. Do not delete the branch after merging because we will later merge it into the "develop" branch to pick up the hotfix. More on this later.

Either way, as usual, you should ensure that all tests are passing. Please note that you might need to bump the version in `jenkins.yml <https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/blob/develop/tests/group_vars/jenkins.yml>`_ in dataverse-ansible to get the tests to run.

Merge "develop" into "master"
-----------------------------

The "develop" branch should be merged into "master" before tagging. See also the branching strategy described in the :doc:`version-control` section.
Note: If you are making a hotfix release, the "develop" branch is not involved so you can skip this step.

Write Release Notes
-------------------
The "develop" branch should be merged into "master" before tagging.

Developers should express the need for an addition to release notes by creating a file in ``/doc/release-notes`` containing the name of the issue they're working on. The name of the branch could be used for the filename with ".md" appended (release notes are written in Markdown) such as ``5053-apis-custom-homepage.md``.
Create a Draft Release on GitHub
--------------------------------

At or near release time:
Create a draft release at https://github.com/IQSS/dataverse/releases/new

- Create an issue in Github to track the work of creating release notes for the upcoming release
- Create a branch, add a .md file for the release (ex. 4.16 Release Notes) in ``/doc/release-notes`` and write the release notes, making sure to pull content from the issue-specific release notes mentioned above
- Delete the previously-created, issue-specific release notes as the content is added to the main release notes file
- Take the release notes .md through the regular Code Review and QA process
- Create a draft release at https://github.com/IQSS/dataverse/releases/new
- The "tag version" and "title" should be the number of the milestone with a "v" in front (i.e. v4.16).
- Copy in the content from the .md file
- For the description, follow post-4.16 examples at https://github.com/IQSS/dataverse/releases
The "tag version" and "title" should be the number of the milestone with a "v" in front (i.e. v5.10.1).

Copy in the content from the .md file created in the "Write Release Notes" steps above.

Make Artifacts Available for Download
-------------------------------------
Expand All @@ -70,11 +81,46 @@ Upload the following artifacts to the draft release you created:
- installer (``cd scripts/installer && make``)
- other files as needed, such as updated Solr schema and config files

Publish Release
---------------
Publish the Release
-------------------

Click the "Publish release" button.

Close Milestone on GitHub and Create a New One
----------------------------------------------

You can find our milestones at https://github.com/IQSS/dataverse/milestones

Now that we've published the release, close the milestone and create a new one.

Note that for milestones we use just the number without the "v" (e.g. "5.10.1").

Add the Release to the Dataverse Roadmap
----------------------------------------

Add an entry to the list of releases at https://www.iq.harvard.edu/roadmap-dataverse-project

Announce the Release on the Dataverse Blog
------------------------------------------

Make a blog post at https://dataverse.org/blog

Announce the Release on the Mailing List
----------------------------------------

Post a message at https://groups.google.com/g/dataverse-community

For Hotfixes, Merge Hotfix Branch into "develop" and Rename SQL Scripts
-----------------------------------------------------------------------

Note: this only applies to hotfixes!

We've merged the hotfix into the "master" branch but now we need the fixes (and version bump) in the "develop" branch. Make a new branch off the hotfix branch and create a pull request against develop. Merge conflicts are possible and this pull request should go through review and QA like normal. Afterwards it's fine to delete this branch and the hotfix brach that was merged into master.

Because of the hotfix version, any SQL scripts in "develop" should be renamed (from "5.11.0" to "5.11.1" for example). To read more about our naming conventions for SQL scripts, see :doc:`sql-upgrade-scripts`.

Please note that version bumps and SQL script renaming both require all open pull requests to be updated with the latest from the "develop" branch so you might want to add any SQL script renaming to the hotfix branch before you put it through QA to be merged with develop. This way, open pull requests only need to be updated once.

----

Previous: :doc:`containers` | Next: :doc:`tools`
5 changes: 5 additions & 0 deletions doc/sphinx-guides/source/developers/version-control.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,11 @@ Feature branches are used for both developing features and fixing bugs. They are

"3728-doc-apipolicy-fix" is an example of a fine name for your feature branch. It tells us that you are addressing https://github.com/IQSS/dataverse/issues/3728 and the "slug" is short, descriptive, and starts with the issue number.

Hotfix Branches
***************

Hotfix branches are described under :doc:`making-releases`.

.. _how-to-make-a-pull-request:

How to Make a Pull Request
Expand Down
6 changes: 3 additions & 3 deletions modules/dataverse-parent/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -147,10 +147,10 @@

<!-- Major system components and dependencies -->
<payara.version>5.2022.3</payara.version>
<postgresql.version>42.3.5</postgresql.version>
<postgresql.version>42.5.0</postgresql.version>
<solr.version>8.11.1</solr.version>
<aws.version>1.11.762</aws.version>
<google.cloud.version>0.157.0</google.cloud.version>
<aws.version>1.12.290</aws.version>
<google.cloud.version>0.177.0</google.cloud.version>

<!-- Basic libs, logging -->
<jakartaee-api.version>8.0.0</jakartaee-api.version>
Expand Down
6 changes: 3 additions & 3 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.2.4</version>
<version>2.8.9</version>
<scope>compile</scope>
</dependency>
<!-- Should be refactored and moved to transitive section above once on Java EE 8 (makes WAR smaller) -->
Expand Down Expand Up @@ -347,7 +347,7 @@
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.14.2</version>
<version>1.15.3</version>
</dependency>
<dependency>
<groupId>io.searchbox</groupId>
Expand Down Expand Up @@ -380,7 +380,7 @@
<dependency>
<groupId>com.nimbusds</groupId>
<artifactId>oauth2-oidc-sdk</artifactId>
<version>9.9.1</version>
<version>9.41.1</version>
</dependency>
<!-- New and Improved GDCC XOAI library! -->
<dependency>
Expand Down
36 changes: 36 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -802,6 +802,35 @@ public void exportAllDatasets(boolean forceReExport) {

}


@Asynchronous
public void reExportDatasetAsync(Dataset dataset) {
exportDataset(dataset, true);
}

public void exportDataset(Dataset dataset, boolean forceReExport) {
if (dataset != null) {
// Note that the logic for handling a dataset is similar to what is implemented in exportAllDatasets,
// but when only one dataset is exported we do not log in a separate export logging file
if (dataset.isReleased() && dataset.getReleasedVersion() != null && !dataset.isDeaccessioned()) {

// can't trust dataset.getPublicationDate(), no.
Date publicationDate = dataset.getReleasedVersion().getReleaseTime(); // we know this dataset has a non-null released version! Maybe not - SEK 8/19 (We do now! :)
if (forceReExport || (publicationDate != null
&& (dataset.getLastExportTime() == null
|| dataset.getLastExportTime().before(publicationDate)))) {
try {
recordService.exportAllFormatsInNewTransaction(dataset);
logger.info("Success exporting dataset: " + dataset.getDisplayName() + " " + dataset.getGlobalIdString());
} catch (Exception ex) {
logger.info("Error exporting dataset: " + dataset.getDisplayName() + " " + dataset.getGlobalIdString() + "; " + ex.getMessage());
}
}
}
}

}

public String getReminderString(Dataset dataset, boolean canPublishDataset) {
return getReminderString( dataset, canPublishDataset, false);
}
Expand Down Expand Up @@ -842,6 +871,13 @@ public String getReminderString(Dataset dataset, boolean canPublishDataset, bool
}
}

@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public int clearAllExportTimes() {
Query clearExportTimes = em.createQuery("UPDATE Dataset SET lastExportTime = NULL");
int numRowsUpdated = clearExportTimes.executeUpdate();
return numRowsUpdated;
}

public Dataset setNonDatasetFileAsThumbnail(Dataset dataset, InputStream inputStream) {
if (dataset == null) {
logger.fine("In setNonDatasetFileAsThumbnail but dataset is null! Returning null.");
Expand Down
38 changes: 32 additions & 6 deletions src/main/java/edu/harvard/iq/dataverse/api/Metadata.java
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,25 @@
*/
package edu.harvard.iq.dataverse.api;

import edu.harvard.iq.dataverse.Dataset;
import edu.harvard.iq.dataverse.DatasetServiceBean;

import java.io.IOException;
import java.util.concurrent.Future;
import java.util.logging.Logger;
import javax.ejb.EJB;
import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import javax.json.Json;
import javax.json.JsonArrayBuilder;
import javax.json.JsonObjectBuilder;
import javax.ws.rs.*;
import javax.ws.rs.core.Response;

import javax.ws.rs.core.Response;
import javax.ws.rs.PathParam;
import javax.ws.rs.PUT;

import edu.harvard.iq.dataverse.DatasetVersion;
import edu.harvard.iq.dataverse.harvest.server.OAISetServiceBean;
import edu.harvard.iq.dataverse.harvest.server.OAISet;
import org.apache.solr.client.solrj.SolrServerException;

/**
*
Expand Down Expand Up @@ -59,7 +65,27 @@ public Response exportAll() {
public Response reExportAll() {
datasetService.reExportAllAsync();
return this.accepted();
}
}

@GET
@Path("{id}/reExportDataset")
public Response indexDatasetByPersistentId(@PathParam("id") String id) {
try {
Dataset dataset = findDatasetOrDie(id);
datasetService.reExportDatasetAsync(dataset);
return ok("export started");
} catch (WrappedResponse wr) {
return wr.getResponse();
}
}

@GET
@Path("clearExportTimestamps")
public Response clearExportTimestamps() {
// only clear the timestamp in the database, cached metadata export files are not deleted
int numItemsCleared = datasetService.clearAllExportTimes();
return ok("cleared: " + numItemsCleared);
}

/**
* initial attempt at triggering indexing/creation/population of a OAI set without going throught
Expand Down
Loading

0 comments on commit e448b68

Please sign in to comment.