Skip to content

Commit

Permalink
Merge branch 'IQSS:develop' into 10015_ro_crate_mime_type
Browse files Browse the repository at this point in the history
  • Loading branch information
ErykKul authored May 2, 2024
2 parents 95dd558 + a329f29 commit 0252cdb
Show file tree
Hide file tree
Showing 63 changed files with 1,863 additions and 304 deletions.
3 changes: 2 additions & 1 deletion conf/solr/9.3.0/schema.xml
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,8 @@
<field name="publicationStatus" type="string" stored="true" indexed="true" multiValued="true"/>
<field name="externalStatus" type="string" stored="true" indexed="true" multiValued="false"/>
<field name="embargoEndDate" type="plong" stored="true" indexed="true" multiValued="false"/>

<field name="retentionEndDate" type="plong" stored="true" indexed="true" multiValued="false"/>

<field name="subtreePaths" type="string" stored="true" indexed="true" multiValued="true"/>

<field name="fileName" type="text_en" stored="true" indexed="true" multiValued="true"/>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
For scenarios involving API calls related to large datasets (Numerous files, for example: ~10k) it has been optimized:

- The search API endpoint.
- The permission checking logic present in PermissionServiceBean.
8 changes: 8 additions & 0 deletions doc/release-notes/9375-retention-period.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
The Dataverse Software now supports file-level retention periods. The ability to set retention periods, with a minimum duration (in months), can be configured by a Dataverse installation administrator. For more information, see the [Retention Periods section](https://guides.dataverse.org/en/6.3/user/dataset-management.html#retention-periods) of the Dataverse Software Guides.

- Users can configure a specific retention period, defined by an end date and a short reason, on a set of selected files or an individual file, by selecting the 'Retention Period' menu item and entering information in a popup dialog. Retention Periods can only be set, changed, or removed before a file has been published. After publication, only Dataverse installation administrators can make changes, using an API.

- After the retention period expires, files can not be previewed or downloaded (as if restricted, with no option to allow access requests). The file (landing) page and all the metadata remains available.


Release notes should mention that a Solr schema update is needed.
49 changes: 47 additions & 2 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1228,6 +1228,7 @@ File access filtering is also optionally supported. In particular, by the follow
* ``Restricted``
* ``EmbargoedThenRestricted``
* ``EmbargoedThenPublic``
* ``RetentionPeriodExpired``

If no filter is specified, the files will match all of the above categories.

Expand Down Expand Up @@ -1277,7 +1278,7 @@ The returned file counts are based on different criteria:
- Per content type
- Per category name
- Per tabular tag name
- Per access status (Possible values: Public, Restricted, EmbargoedThenRestricted, EmbargoedThenPublic)
- Per access status (Possible values: Public, Restricted, EmbargoedThenRestricted, EmbargoedThenPublic, RetentionPeriodExpired)

.. code-block:: bash
Expand Down Expand Up @@ -1331,6 +1332,7 @@ File access filtering is also optionally supported. In particular, by the follow
* ``Restricted``
* ``EmbargoedThenRestricted``
* ``EmbargoedThenPublic``
* ``RetentionPeriodExpired``

If no filter is specified, the files will match all of the above categories.

Expand Down Expand Up @@ -2146,6 +2148,7 @@ File access filtering is also optionally supported. In particular, by the follow
* ``Restricted``
* ``EmbargoedThenRestricted``
* ``EmbargoedThenPublic``
* ``RetentionPeriodExpired``

If no filter is specified, the files will match all of the above categories.

Expand Down Expand Up @@ -2583,7 +2586,38 @@ The API call requires a Json body that includes the list of the fileIds that the
export JSON='{"fileIds":[300,301]}'
curl -H "X-Dataverse-key: $API_TOKEN" -H "Content-Type:application/json" "$SERVER_URL/api/datasets/:persistentId/files/actions/:unset-embargo?persistentId=$PERSISTENT_IDENTIFIER" -d "$JSON"
Set a Retention Period on Files in a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``/api/datasets/$dataset-id/files/actions/:set-retention`` can be used to set a retention period on one or more files in a dataset. Retention periods can be set on files that are only in a draft dataset version (and are not in any previously published version) by anyone who can edit the dataset. The same API call can be used by a superuser to add a retention period to files that have already been released as part of a previously published dataset version.

The API call requires a Json body that includes the retention period's end date (dateUnavailable), a short reason (optional), and a list of the fileIds that the retention period should be set on. The dateUnavailable must be after the current date and the duration (dateUnavailable - today's date) must be larger than the value specified by the :ref:`:MinRetentionDurationInMonths` setting. All files listed must be in the specified dataset. For example:

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export JSON='{"dateUnavailable":"2051-12-31", "reason":"Standard project retention period", "fileIds":[300,301,302]}'
curl -H "X-Dataverse-key: $API_TOKEN" -H "Content-Type:application/json" "$SERVER_URL/api/datasets/:persistentId/files/actions/:set-retention?persistentId=$PERSISTENT_IDENTIFIER" -d "$JSON"
Remove a Retention Period on Files in a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``/api/datasets/$dataset-id/files/actions/:unset-retention`` can be used to remove a retention period on one or more files in a dataset. Retention periods can be removed from files that are only in a draft dataset version (and are not in any previously published version) by anyone who can edit the dataset. The same API call can be used by a superuser to remove retention periods from files that have already been released as part of a previously published dataset version.

The API call requires a Json body that includes the list of the fileIds that the retention period should be removed from. All files listed must be in the specified dataset. For example:

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export JSON='{"fileIds":[300,301]}'
curl -H "X-Dataverse-key: $API_TOKEN" -H "Content-Type:application/json" "$SERVER_URL/api/datasets/:persistentId/files/actions/:unset-retention?persistentId=$PERSISTENT_IDENTIFIER" -d "$JSON"
.. _Archival Status API:

Expand Down Expand Up @@ -5647,6 +5681,17 @@ List permissions a user (based on API Token used) has on a Dataverse collection
The ``$identifier`` can be a Dataverse collection alias or database id or a dataset persistent ID or database id.
.. note:: Datasets can be selected using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the dataset is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.
Example: List permissions a user (based on API Token used) has on a dataset whose DOI is *10.5072/FK2/J8SJZB*:
.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/admin/permissions/:persistentId?persistentId=$PERSISTENT_IDENTIFIER"
Show Role Assignee
~~~~~~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/container/running/demo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -220,4 +220,4 @@ Your feedback is extremely valuable to us! To let us know what you think, please
Getting Help
------------

Please do not be shy about reaching out for help. We very much want you to have a pleasant demo or evaluation experience. For ways to contact us, please see See :ref:`getting-help-containers`.
Please do not be shy about reaching out for help. We very much want you to have a pleasant demo or evaluation experience. For ways to contact us, please see :ref:`getting-help-containers`.
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/developers/big-data-support.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.Data
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
export JSON_DATA="{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'trs://images/dataverse_project_logo.svg', 'fileName':'dataverse_logo.svg', 'mimeType':'image/svg+xml', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}"
export JSON_DATA='{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"trs://images/dataverse_project_logo.svg", "fileName":"dataverse_logo.svg", "mimeType":"image/svg+xml", "checksum": {"@type": "SHA-1", "@value": "123456"}}'
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
Expand Down
12 changes: 12 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4549,6 +4549,18 @@ can enter for an embargo end date. This limit will be enforced in the popup dial

``curl -X PUT -d 24 http://localhost:8080/api/admin/settings/:MaxEmbargoDurationInMonths``

.. _:MinRetentionDurationInMonths:

:MinRetentionDurationInMonths
+++++++++++++++++++++++++++++

This setting controls whether retention periods are allowed in a Dataverse instance and can limit the minimum duration users are allowed to specify. A value of 0 months or non-existent
setting indicates retention periods are not supported. A value of -1 allows retention periods of any length. Any other value indicates the minimum number of months (from the current date) a user
can enter for a retention period end date. This limit will be enforced in the popup dialog in which users enter the retention period end date. For example, to set a ten year minimum:

``curl -X PUT -d 120 http://localhost:8080/api/admin/settings/:MinRetentionDurationInMonths``


:DataverseMetadataValidatorScript
+++++++++++++++++++++++++++++++++

Expand Down
8 changes: 8 additions & 0 deletions doc/sphinx-guides/source/user/dataset-management.rst
Original file line number Diff line number Diff line change
Expand Up @@ -735,6 +735,14 @@ Once a dataset with embargoed files has been published, no further action is nee

As the primary use case of embargoes is to make the existence of data known now, with a promise (to a journal, project team, etc.) that the data itself will become available at a given future date, users cannot change an embargo once a dataset version is published. Dataverse instance administrators do have the ability to correct mistakes and make changes if/when circumstances warrant.

Retention Periods
=================

Support for file-level retention periods can also be configured in a Dataverse instance. Retention periods make file content inaccessible after the retention period end date. This means that file previews and the ability to download files will be blocked. The effect is similar to when a file is restricted except that the retention periods will end at the specified date without further action and after the retention periods expires, requests for file access cannot be made.

Retention periods are intended to support use cases where files must be made unavailable - and in most cases destroyed, e.g. to meet legal requirements - after a certain period or date.
Actual destruction is not automatically handled, but would have to be done on the storage if needed.

Dataset Versions
================

Expand Down
12 changes: 12 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/DataFile.java
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,18 @@ public void setEmbargo(Embargo embargo) {
this.embargo = embargo;
}

@ManyToOne
@JoinColumn(name="retention_id")
private Retention retention;

public Retention getRetention() {
return retention;
}

public void setRetention(Retention retention) {
this.retention = retention;
}

public DataFile() {
this.fileMetadatas = new ArrayList<>();
initFileReplaceAttributes();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -959,6 +959,7 @@ public boolean isThumbnailAvailable (DataFile file) {
return true;
}
file.setPreviewImageFail(true);
file.setPreviewImageAvailable(false);
this.save(file);
return false;
}
Expand Down Expand Up @@ -1365,7 +1366,10 @@ public Embargo findEmbargo(Long id) {
DataFile d = find(id);
return d.getEmbargo();
}


public boolean isRetentionExpired(FileMetadata fm) {
return FileUtil.isRetentionExpired(fm);
}
/**
* Checks if the supplied DvObjectContainer (Dataset or Collection; although
* only collection-level storage quotas are officially supported as of now)
Expand Down
Loading

0 comments on commit 0252cdb

Please sign in to comment.