Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iqss/4706 dpn submission of archival copies #5049

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
d70be10
Workflow enhancements
qqmyers Sep 11, 2018
ec53d91
Merge branch 'IQSS/5033' into QDR-953-dpn
qqmyers Sep 11, 2018
d7ea814
Merge branch 'QDR-953_streaming_exports' into QDR-953-dpn
qqmyers Sep 11, 2018
a4135a0
Merge branch 'QDR-953-workflow' into QDR-953-dpn
qqmyers Sep 11, 2018
4aa8852
update to schema v4
qqmyers Sep 11, 2018
5277bd0
DPN submission - api, gui, and workflow
qqmyers Sep 11, 2018
72e08a9
submission api
qqmyers Sep 14, 2018
6e2e9d9
removing GUI and adding info to API return to provide URL
qqmyers Sep 15, 2018
56fcfc3
Merge remote-tracking branch 'IQSS/develop' into QDR-953-dpn
qqmyers Sep 15, 2018
d6d987f
Merge branch 'IQSS/5033' into QDR-953-dpn
qqmyers Sep 18, 2018
31bf8f7
Simplifying core terms/ using static namespace
qqmyers Sep 20, 2018
0c3d346
draft tsv mapping
qqmyers Sep 20, 2018
bf12024
datcite terms added
qqmyers Sep 20, 2018
e2765fe
Simplifying core terms/ using static namespace/draft tsv mapping
qqmyers Sep 20, 2018
1dd6382
Merge branch 'IQSS/4706-DPN_Submission_of_archival_copies' of https:/…
qqmyers Sep 20, 2018
5c16850
Merge branch 'QDR-953_streaming_exports' into QDR-953-dpn
qqmyers Sep 26, 2018
191e400
refactor, new examples
qqmyers Sep 26, 2018
985c1ef
Merge branch 'QDR-953_streaming_exports' into QDR-953-dpn
qqmyers Oct 11, 2018
f2f1189
missing import
qqmyers Oct 11, 2018
bdccdfe
prepare for PR 5192
qqmyers Oct 19, 2018
4d7d4d1
Merge remote-tracking branch 'IQSS/develop' into QDR-953-dpn
qqmyers Oct 22, 2018
250fed0
Merge remote-tracking branch 'IQSS/develop' into QDR-953-dpn
qqmyers Nov 2, 2018
1e86450
move db update command to 4.9.10 script
qqmyers Nov 2, 2018
f7940d1
uncomment change for 5192
qqmyers Nov 2, 2018
d05d074
and remove old code...
qqmyers Nov 2, 2018
88243f5
support friendly ver in api
qqmyers Nov 2, 2018
8c36213
initial admin doc
qqmyers Nov 2, 2018
a86f997
need Long
qqmyers Nov 2, 2018
f2b8c43
use new permission
qqmyers Nov 5, 2018
d40668b
further documentation
qqmyers Nov 5, 2018
886f1fc
typos from config
qqmyers Nov 5, 2018
0f543f2
missed Long to Stirng param change to support ":persistentId"
qqmyers Nov 5, 2018
86b7652
give admin permission to submit to DPN
qqmyers Nov 5, 2018
39dc688
revert per IQSS
qqmyers Nov 26, 2018
b3b439e
adding comment about unused method
qqmyers Nov 26, 2018
1e74756
remove superuser restriction on archiveVersion method
qqmyers Nov 26, 2018
86c7ceb
remove archive permission, use Permission.PublishDataset
qqmyers Nov 26, 2018
c673bbf
Merge remote-tracking branch 'IQSS/develop' into QDR-953-dpn
qqmyers Nov 26, 2018
0aaf875
attempt to separate workflow docs per IQSS
qqmyers Nov 26, 2018
20d6f98
updates to generalize archiver
qqmyers Nov 28, 2018
c6a3343
partial update
qqmyers Nov 28, 2018
c574f42
reflective instantiation of archiver
qqmyers Nov 28, 2018
bd94c1b
updates of documentation and examples
qqmyers Nov 28, 2018
7502211
Merge remote-tracking branch 'IQSS/develop' into QDR-953-dpn
qqmyers Nov 29, 2018
5f2e1b0
Merge remote-tracking branch 'IQSS/develop' into QDR-953-dpn
qqmyers Nov 30, 2018
1d5709e
refactor to move abstract class to impl, del old workflow, fix docs
qqmyers Dec 3, 2018
0049ac6
workflow printing typo
qqmyers Dec 3, 2018
ffea9a2
typo and doc fixes
qqmyers Dec 3, 2018
25ebf5d
remove comment
qqmyers Dec 5, 2018
b6692c0
closer mirror to AbstractCommand
qqmyers Dec 5, 2018
714c794
extend AbstractCommand
qqmyers Dec 6, 2018
5109c4c
farewell DPN
qqmyers Dec 6, 2018
d89cfb8
Merge remote-tracking branch 'IQSS/develop' into QDR-953-dpn
qqmyers Dec 22, 2018
01a079f
debug msgs
qqmyers Jan 2, 2019
375cc17
Merge remote-tracking branch 'IQSS/develop' into QDR-953-dpn
qqmyers Jan 8, 2019
9ffe806
remove logging
qqmyers Jan 8, 2019
bd38898
update jsoup
qqmyers Jan 8, 2019
312f78c
latest jsoup reverses the order of two attributes...
qqmyers Jan 9, 2019
b7b0a68
switch to java.util logging
qqmyers Jan 9, 2019
9c8d749
Merge remote-tracking branch 'IQSS/develop' into QDR-953-dpn
qqmyers Jan 11, 2019
4506ddb
exclude log4j-over-slf4j
qqmyers Jan 11, 2019
139f574
update to commons-compress
qqmyers Jan 14, 2019
855205c
exclude duplicate aws jars
qqmyers Jan 14, 2019
f6cc089
moving db upgrade command
qqmyers Jan 14, 2019
8027fe7
use :doc: syntax to link to other pages #4706
pdurbin Jan 14, 2019
5122fd9
make JSON look nice #4706
pdurbin Jan 14, 2019
28046c6
use :doc ...
qqmyers Jan 14, 2019
28fabc9
Merge branch 'IQSS/4706-DPN_Submission_of_archival_copies' of https:/…
qqmyers Jan 14, 2019
b407fd7
cleanup script - revert to released version
qqmyers Jan 14, 2019
7b7f4b5
avoid smart quotes in curl commands with double backticks #4706
pdurbin Jan 14, 2019
2739fa2
missing properties
qqmyers Jan 15, 2019
eb6b47d
Merge branch 'IQSS/4706-DPN_Submission_of_archival_copies' of https:/…
qqmyers Jan 15, 2019
a236907
propagate exceptions if bag or xml file can't be created...
qqmyers Jan 15, 2019
983ceac
move details about DuraCloud/Chronopolis setup #4706
pdurbin Jan 16, 2019
e2312da
Merge pull request #19 from IQSS/4706-bagit-docs
qqmyers Jan 16, 2019
9045d7a
Make archiver api call async and cleanup DPN refs
qqmyers Jan 17, 2019
231e681
Merge branch 'IQSS/4706-DPN_Submission_of_archival_copies' of https:/…
qqmyers Jan 17, 2019
581e5dc
doc updates
qqmyers Jan 18, 2019
d3e8cf1
Update config.rst
kcondon Jan 22, 2019
3753dba
missing parsing for requiredSettings
qqmyers Jan 22, 2019
dd378ba
Merge branch 'IQSS/4706-DPN_Submission_of_archival_copies' of https:/…
qqmyers Jan 22, 2019
c77ab16
need : in setting names
qqmyers Jan 22, 2019
8d09fbf
Update config.rst
kcondon Jan 22, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,4 @@ scripts/installer/default.config
# do not track IntelliJ IDEA files
.idea
**/*.iml
/bin/
11 changes: 9 additions & 2 deletions doc/sphinx-guides/source/admin/integrations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,15 +93,22 @@ SHARE
`SHARE <http://www.share-research.org>`_ is building a free, open, data set about research and scholarly activities across their life cycle. It's possible to add and installation of Dataverse as one of the `sources <https://share.osf.io/sources>`_ they include if you contact the SHARE team.

Research Data Preservation
-------------------
--------------------------

Archivematica
+++++
+++++++++++++

`Archivematica <https://www.archivematica.org>`_ is an integrated suite of open-source tools for processing digital objects for long-term preservation, developed and maintained by Artefactual Systems Inc. Its configurable workflow is designed to produce system-independent, standards-based Archival Information Packages (AIPs) suitable for long-term storage and management.

Sponsored by the `Ontario Council of University Libraries (OCUL) <https://ocul.on.ca/>`_, this technical integration enables users of Archivematica to select datasets from connected Dataverse instances and process them for long-term access and digital preservation. For more information and list of known issues, please refer to Artefactual's `release notes <https://wiki.archivematica.org/Archivematica_1.8_and_Storage_Service_0.13_release_notes>`_, `integration documentation <https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/transfer/dataverse/>`_, and the `project wiki <https://wiki.archivematica.org/Dataverse>`_.

DuraCloud/Chronopolis
+++++++++++++++++++++

Dataverse can be configured to submit a copy of published Datasets, packaged as `Research Data Alliance conformant <https://www.rd-alliance.org/system/files/Research%20Data%20Repository%20Interoperability%20WG%20-%20Final%20Recommendations_reviewed_0.pdf>`_ zipped `BagIt <https://tools.ietf.org/html/draft-kunze-bagit-17>`_ bags to the `Chronopolis <https://libraries.ucsd.edu/chronopolis/>`_ via `DuraCloud <https://duraspace.org/duracloud/>`_

For details on how to configure this integration, look for "DuraCloud/Chronopolis" in the :doc:`/installation/config` section of the Installation Guide.

Future Integrations
-------------------

Expand Down
96 changes: 1 addition & 95 deletions doc/sphinx-guides/source/developers/big-data-support.rst
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ Configuring the RSAL Mock

Info for configuring the RSAL Mock: https://github.com/sbgrid/rsal/tree/master/mocks

Also, to configure Dataverse to use the new workflow you must do the following (see also the section below on workflows):
Also, to configure Dataverse to use the new workflow you must do the following (see also the :doc:`workflows` section):

1. Configure the RSAL URL:

Expand Down Expand Up @@ -301,98 +301,4 @@ In the GUI, this is called "Local Access". It's where you can compute on files o

``curl http://localhost:8080/api/admin/settings/:LocalDataAccessPath -X PUT -d "/programs/datagrid"``

Workflows
---------

Dataverse can perform two sequences of actions when datasets are published: one prior to publishing (marked by a ``PrePublishDataset`` trigger), and one after the publication has succeeded (``PostPublishDataset``). The pre-publish workflow is useful for having an external system prepare a dataset for being publicly accessed (a possibly lengthy activity that requires moving files around, uploading videos to a streaming server, etc.), or to start an approval process. A post-publish workflow might be used for sending notifications about the newly published dataset.

Workflow steps are created using *step providers*. Dataverse ships with an internal step provider that offers some basic functionality, and with the ability to load 3rd party step providers. This allows installations to implement functionality they need without changing the Dataverse source code.

Steps can be internal (say, writing some data to the log) or external. External steps involve Dataverse sending a request to an external system, and waiting for the system to reply. The wait period is arbitrary, and so allows the external system unbounded operation time. This is useful, e.g., for steps that require human intervension, such as manual approval of a dataset publication.

The external system reports the step result back to dataverse, by sending a HTTP ``POST`` command to ``api/workflows/{invocation-id}``. The body of the request is passed to the paused step for further processing.

If a step in a workflow fails, Dataverse make an effort to roll back all the steps that preceeded it. Some actions, such as writing to the log, cannot be rolled back. If such an action has a public external effect (e.g. send an EMail to a mailing list) it is advisable to put it in the post-release workflow.

.. tip::
For invoking external systems using a REST api, Dataverse's internal step
provider offers a step for sending and receiving customizable HTTP requests.
It's called *http/sr*, and is detailed below.

Administration
~~~~~~~~~~~~~~

A Dataverse instance stores a set of workflows in its database. Workflows can be managed using the ``api/admin/workflows/`` endpoints of the :doc:`/api/native-api`. Sample workflow files are available in ``scripts/api/data/workflows``.

At the moment, defining a workflow for each trigger is done for the entire instance, using the endpoint ``api/admin/workflows/default/«trigger type»``.

In order to prevent unauthorized resuming of workflows, Dataverse maintains a "white list" of IP addresses from which resume requests are honored. This list is maintained using the ``/api/admin/workflows/ip-whitelist`` endpoint of the :doc:`/api/native-api`. By default, Dataverse honors resume requests from localhost only (``127.0.0.1;::1``), so set-ups that use a single server work with no additional configuration.


Available Steps
~~~~~~~~~~~~~~~

Dataverse has an internal step provider, whose id is ``:internal``. It offers the following steps:

log
^^^

A step that writes data about the current workflow invocation to the instance log. It also writes the messages in its ``parameters`` map.

.. code:: json

{
"provider":":internal",
"stepType":"log",
"parameters": {
"aMessage": "message content",
"anotherMessage": "message content, too"
}
}


pause
^^^^^

A step that pauses the workflow. The workflow is paused until a POST request is sent to ``/api/workflows/{invocation-id}``.

.. code:: json

{
"provider":":internal",
"stepType":"pause"
}


http/sr
^^^^^^^

A step that sends a HTTP request to an external system, and then waits for a response. The response has to match a regular expression specified in the step parameters. The url, content type, and message body can use data from the workflow context, using a simple markup language. This step has specific parameters for rollback.

.. code:: json

{
"provider":":internal",
"stepType":"http/sr",
"parameters": {
"url":"http://localhost:5050/dump/${invocationId}",
"method":"POST",
"contentType":"text/plain",
"body":"START RELEASE ${dataset.id} as ${dataset.displayName}",
"expectedResponse":"OK.*",
"rollbackUrl":"http://localhost:5050/dump/${invocationId}",
"rollbackMethod":"DELETE ${dataset.id}"
}
}

Available variables are:

* ``invocationId``
* ``dataset.id``
* ``dataset.identifier``
* ``dataset.globalId``
* ``dataset.displayName``
* ``dataset.citation``
* ``minorVersion``
* ``majorVersion``
* ``releaseStatus``
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/developers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ Developer Guide
geospatial
selinux
big-data-support
workflows
130 changes: 130 additions & 0 deletions doc/sphinx-guides/source/developers/workflows.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
Workflows
================

Dataverse has a flexible workflow mechanism that can be used to trigger actions before and after Dataset publication.

.. contents:: |toctitle|
:local:


Introduction
---------

Dataverse can perform two sequences of actions when datasets are published: one prior to publishing (marked by a ``PrePublishDataset`` trigger), and one after the publication has succeeded (``PostPublishDataset``). The pre-publish workflow is useful for having an external system prepare a dataset for being publicly accessed (a possibly lengthy activity that requires moving files around, uploading videos to a streaming server, etc.), or to start an approval process. A post-publish workflow might be used for sending notifications about the newly published dataset.

Workflow steps are created using *step providers*. Dataverse ships with an internal step provider that offers some basic functionality, and with the ability to load 3rd party step providers. This allows installations to implement functionality they need without changing the Dataverse source code.

Steps can be internal (say, writing some data to the log) or external. External steps involve Dataverse sending a request to an external system, and waiting for the system to reply. The wait period is arbitrary, and so allows the external system unbounded operation time. This is useful, e.g., for steps that require human intervension, such as manual approval of a dataset publication.

The external system reports the step result back to dataverse, by sending a HTTP ``POST`` command to ``api/workflows/{invocation-id}``. The body of the request is passed to the paused step for further processing.

If a step in a workflow fails, Dataverse make an effort to roll back all the steps that preceded it. Some actions, such as writing to the log, cannot be rolled back. If such an action has a public external effect (e.g. send an EMail to a mailing list) it is advisable to put it in the post-release workflow.

.. tip::
For invoking external systems using a REST api, Dataverse's internal step
provider offers a step for sending and receiving customizable HTTP requests.
It's called *http/sr*, and is detailed below.

Administration
~~~~~~~~~~~~~~

A Dataverse instance stores a set of workflows in its database. Workflows can be managed using the ``api/admin/workflows/`` endpoints of the :doc:`/api/native-api`. Sample workflow files are available in ``scripts/api/data/workflows``.

At the moment, defining a workflow for each trigger is done for the entire instance, using the endpoint ``api/admin/workflows/default/«trigger type»``.

In order to prevent unauthorized resuming of workflows, Dataverse maintains a "white list" of IP addresses from which resume requests are honored. This list is maintained using the ``/api/admin/workflows/ip-whitelist`` endpoint of the :doc:`/api/native-api`. By default, Dataverse honors resume requests from localhost only (``127.0.0.1;::1``), so set-ups that use a single server work with no additional configuration.


Available Steps
~~~~~~~~~~~~~~~

Dataverse has an internal step provider, whose id is ``:internal``. It offers the following steps:

log
+++

A step that writes data about the current workflow invocation to the instance log. It also writes the messages in its ``parameters`` map.

.. code:: json

{
"provider":":internal",
"stepType":"log",
"parameters": {
"aMessage": "message content",
"anotherMessage": "message content, too"
}
}


pause
+++++

A step that pauses the workflow. The workflow is paused until a POST request is sent to ``/api/workflows/{invocation-id}``.

.. code:: json

{
"provider":":internal",
"stepType":"pause"
}


http/sr
+++++++

A step that sends a HTTP request to an external system, and then waits for a response. The response has to match a regular expression specified in the step parameters. The url, content type, and message body can use data from the workflow context, using a simple markup language. This step has specific parameters for rollback.

.. code:: json

{
"provider":":internal",
"stepType":"http/sr",
"parameters": {
"url":"http://localhost:5050/dump/${invocationId}",
"method":"POST",
"contentType":"text/plain",
"body":"START RELEASE ${dataset.id} as ${dataset.displayName}",
"expectedResponse":"OK.*",
"rollbackUrl":"http://localhost:5050/dump/${invocationId}",
"rollbackMethod":"DELETE ${dataset.id}"
}
}

Available variables are:

* ``invocationId``
* ``dataset.id``
* ``dataset.identifier``
* ``dataset.globalId``
* ``dataset.displayName``
* ``dataset.citation``
* ``minorVersion``
* ``majorVersion``
* ``releaseStatus``

archiver
+++++++

A step that sends an archival copy of a Dataset Version to a configured archiver, e.g. the DuraCloud interface of Chronopolis. See the `DuraCloud/Chronopolis Integration documentation <http://guides.dataverse.org/en/latest/admin/integrations.html#id15>`_ for further detail.

Note - the example step includes two settings required for any archiver and three (DuraCloud*) that are specific to DuraCloud.

.. code:: json


{
"provider":":internal",
"stepType":"archiver",
"parameters": {
"stepName":"archive submission"
},
"requiredSettings": {
":ArchiverClassName": "string",
":ArchiverSettings": "string",
":DuraCloudHost":"string",
":DuraCloudPort":"string",
":DuraCloudContext":"string"
}
}

Loading