Skip to content

Commit

Permalink
Merge branch 'IQSS:develop' into 10015_ro_crate_mime_type
Browse files Browse the repository at this point in the history
  • Loading branch information
ErykKul authored Apr 22, 2024
2 parents 99813de + cf79282 commit 902dcb3
Show file tree
Hide file tree
Showing 34 changed files with 378 additions and 134 deletions.
5 changes: 5 additions & 0 deletions doc/release-notes/10022_upload_redirect_without_tagging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
If your S3 store does not support tagging and gives an error if you configure direct uploads, you can disable the tagging by using the ``dataverse.files.<id>.disable-tagging`` JVM option. For more details see https://dataverse-guide--10029.org.readthedocs.build/en/10029/developers/big-data-support.html#s3-tags #10022 and #10029.

## New config options

- dataverse.files.<id>.disable-tagging
5 changes: 5 additions & 0 deletions doc/release-notes/10316_cvoc_http_headers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
You are now able to add HTTP request headers required by the External Vocabulary Services you are implementing.

A combined documentation can be found on pull request [#10404](https://github.com/IQSS/dataverse/pull/10404).

For more information, see issue [#10316](https://github.com/IQSS/dataverse/issues/10316) and pull request [gddc/dataverse-external-vocab-support#19](https://github.com/gdcc/dataverse-external-vocab-support/pull/19).
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
The Dataverse object returned by /api/dataverses has been extended to include "isReleased": {boolean}.
```javascript
{
"status": "OK",
"data": {
"id": 32,
"alias": "dv6f645bb5",
"name": "dv6f645bb5",
"dataverseContacts": [
{
"displayOrder": 0,
"contactEmail": "54180268@mailinator.com"
}
],
"permissionRoot": true,
"dataverseType": "UNCATEGORIZED",
"ownerId": 1,
"creationDate": "2024-04-12T18:05:59Z",
"isReleased": true
}
}
```
1 change: 1 addition & 0 deletions doc/release-notes/9887-new-superuser-status-endpoint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
The existing API endpoint for toggling the superuser status of a user has been deprecated in favor of a new API endpoint that allows you to explicitly and idempotently set the status as true or false. For details, see [the guides](https://dataverse-guide--10440.org.readthedocs.build/en/10440/api/native-api.html), #9887 and #10440.
6 changes: 4 additions & 2 deletions doc/sphinx-guides/source/admin/metadatacustomization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -552,6 +552,8 @@ Great care must be taken when reloading a metadata block. Matching is done on fi

The ability to reload metadata blocks means that SQL update scripts don't need to be written for these changes. See also the :doc:`/developers/sql-upgrade-scripts` section of the Developer Guide.

.. _using-external-vocabulary-services:

Using External Vocabulary Services
----------------------------------

Expand All @@ -577,9 +579,9 @@ In general, the external vocabulary support mechanism may be a better choice for
The specifics of the user interface for entering/selecting a vocabulary term and how that term is then displayed are managed by third-party Javascripts. The initial Javascripts that have been created provide auto-completion, displaying a list of choices that match what the user has typed so far, but other interfaces, such as displaying a tree of options for a hierarchical vocabulary, are possible.
Similarly, existing scripts do relatively simple things for displaying a term - showing the term's name in the appropriate language and providing a link to an external URL with more information, but more sophisticated displays are possible.

Scripts supporting use of vocabularies from services supporting the SKOMOS protocol (see https://skosmos.org) and retrieving ORCIDs (from https://orcid.org) are available https://github.com/gdcc/dataverse-external-vocab-support. (Custom scripts can also be used and community members are encouraged to share new scripts through the dataverse-external-vocab-support repository.)
Scripts supporting use of vocabularies from services supporting the SKOMOS protocol (see https://skosmos.org), retrieving ORCIDs (from https://orcid.org), and using ROR (https://ror.org/) are available https://github.com/gdcc/dataverse-external-vocab-support. (Custom scripts can also be used and community members are encouraged to share new scripts through the dataverse-external-vocab-support repository.)

Configuration involves specifying which fields are to be mapped, whether free-text entries are allowed, which vocabulary(ies) should be used, what languages those vocabulary(ies) are available in, and several service protocol and service instance specific parameters.
Configuration involves specifying which fields are to be mapped, whether free-text entries are allowed, which vocabulary(ies) should be used, what languages those vocabulary(ies) are available in, and several service protocol and service instance specific parameters, including the ability to send HTTP headers on calls to the service.
These are all defined in the :ref:`:CVocConf <:CVocConf>` setting as a JSON array. Details about the required elements as well as example JSON arrays are available at https://github.com/gdcc/dataverse-external-vocab-support, along with an example metadata block that can be used for testing.
The scripts required can be hosted locally or retrieved dynamically from https://gdcc.github.io/ (similar to how dataverse-previewers work).

Expand Down
5 changes: 5 additions & 0 deletions doc/sphinx-guides/source/api/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ This API changelog is experimental and we would love feedback on its usefulness.
:local:
:depth: 1

v6.3
----

- **/api/admin/superuser/{identifier}**: The POST endpoint that toggles superuser status has been deprecated in favor of a new PUT endpoint that allows you to specify true or false. See :ref:`set-superuser-status`.

v6.2
----

Expand Down
46 changes: 40 additions & 6 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2013,7 +2013,7 @@ The fully expanded example above (without environment variables) looks like this
.. _cleanup-storage-api:

Cleanup storage of a Dataset
Cleanup Storage of a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is an experimental feature and should be tested on your system before using it in production.
Expand Down Expand Up @@ -5476,12 +5476,46 @@ Example: ``curl -H "X-Dataverse-key: $API_TOKEN" -X POST "https://demo.datavers
This action changes the identifier of user johnsmith to jsmith.
Make User a SuperUser
~~~~~~~~~~~~~~~~~~~~~
Toggle Superuser Status
~~~~~~~~~~~~~~~~~~~~~~~
Toggle the superuser status of a user.
.. note:: This endpoint is deprecated as explained in :doc:`/api/changelog`. Please use the :ref:`set-superuser-status` endpoint instead.
.. code-block:: bash
export SERVER_URL=http://localhost:8080
export USERNAME=jdoe
curl -X POST "$SERVER_URL/api/admin/superuser/$USERNAME"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -X POST "http://localhost:8080/api/admin/superuser/jdoe"
Toggles superuser mode on the ``AuthenticatedUser`` whose ``identifier`` (without the ``@`` sign) is passed. ::
.. _set-superuser-status:
Set Superuser Status
~~~~~~~~~~~~~~~~~~~~
Specify the superuser status of a user with a boolean value (``true`` or ``false``).
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below.
.. code-block:: bash
export SERVER_URL=http://localhost:8080
export USERNAME=jdoe
export IS_SUPERUSER=true
curl -X PUT "$SERVER_URL/api/admin/superuser/$USERNAME" -d "$IS_SUPERUSER"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
POST http://$SERVER/api/admin/superuser/$identifier
curl -X PUT "http://localhost:8080/api/admin/superuser/jdoe" -d true
.. _delete-a-user:
Expand Down Expand Up @@ -5845,7 +5879,7 @@ Superusers can add a new license by posting a JSON file adapted from this exampl
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
curl -X POST -H 'Content-Type: application/json' -H "X-Dataverse-key:$API_TOKEN" --data-binary @add-license.json "$SERVER_URL/api/licenses"
curl -X POST -H 'Content-Type: application/json' -H "X-Dataverse-key:$API_TOKEN" --upload-file add-license.json "$SERVER_URL/api/licenses"
Superusers can change whether an existing license is active (usable for new dataset versions) or inactive (only allowed on already-published versions) specified by the license ``$ID``:
Expand Down
7 changes: 6 additions & 1 deletion doc/sphinx-guides/source/developers/big-data-support.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,12 @@ with the contents of the file cors.json as follows:
Alternatively, you can enable CORS using the AWS S3 web interface, using json-encoded rules as in the example above.

Since the direct upload mechanism creates the final file rather than an intermediate temporary file, user actions, such as neither saving or canceling an upload session before closing the browser page, can leave an abandoned file in the store. The direct upload mechanism attempts to use S3 Tags to aid in identifying/removing such files. Upon upload, files are given a "dv-state":"temp" tag which is removed when the dataset changes are saved and the new file(s) are added in the Dataverse installation. Note that not all S3 implementations support Tags: Minio does not. WIth such stores, direct upload works, but Tags are not used.
.. _s3-tags-and-direct-upload:

S3 Tags and Direct Upload
~~~~~~~~~~~~~~~~~~~~~~~~~

Since the direct upload mechanism creates the final file rather than an intermediate temporary file, user actions, such as neither saving or canceling an upload session before closing the browser page, can leave an abandoned file in the store. The direct upload mechanism attempts to use S3 tags to aid in identifying/removing such files. Upon upload, files are given a "dv-state":"temp" tag which is removed when the dataset changes are saved and new files are added in the Dataverse installation. Note that not all S3 implementations support tags. Minio, for example, does not. With such stores, direct upload may not work and you might need to disable tagging. For details, see :ref:`s3-tagging` in the Installation Guide.

Trusted Remote Storage with the ``remote`` Store Type
-----------------------------------------------------
Expand Down
6 changes: 6 additions & 0 deletions doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,12 @@ In the single part case, only one call to the supplied URL is required:
curl -i -H 'x-amz-tagging:dv-state=temp' -X PUT -T <filename> "<supplied url>"
Or, if you have disabled S3 tagging (see :ref:`s3-tagging`), you should omit the header like this:

.. code-block:: bash
curl -i -X PUT -T <filename> "<supplied url>"
Note that without the ``-i`` flag, you should not expect any output from the command above. With the ``-i`` flag, you should expect to see a "200 OK" response.

In the multipart case, the client must send each part and collect the 'eTag' responses from the server. The calls for this are the same as the one for the single part case except that each call should send a <partSize> slice of the total file, with the last part containing the remaining bytes.
Expand Down
28 changes: 26 additions & 2 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1189,12 +1189,31 @@ Larger installations may want to increase the number of open S3 connections allo

``./asadmin create-jvm-options "-Ddataverse.files.<id>.connection-pool-size=4096"``

.. _s3-tagging:

S3 Tagging
##########

By default, when direct upload to an S3 store is configured, Dataverse will place a ``temp`` tag on the file being uploaded for an easier cleanup in case the file is not added to the dataset after upload (e.g., if the user cancels the operation). (See :ref:`s3-tags-and-direct-upload`.)
If your S3 store does not support tagging and gives an error when direct upload is configured, you can disable the tagging by using the ``dataverse.files.<id>.disable-tagging`` JVM option. For example:

``./asadmin create-jvm-options "-Ddataverse.files.<id>.disable-tagging=true"``

Disabling the ``temp`` tag makes it harder to identify abandoned files that are not used by your Dataverse instance (i.e. one cannot search for the ``temp`` tag in a delete script). These should still be removed to avoid wasting storage space. To clean up these files and any other leftover files, regardless of whether the ``temp`` tag is applied, you can use the :ref:`cleanup-storage-api` API endpoint.

Note that if you disable tagging, you should should omit the ``x-amz-tagging:dv-state=temp`` header when using the :doc:`/developers/s3-direct-upload-api`, as noted in that section.

Finalizing S3 Configuration
###########################

In case you would like to configure Dataverse to use a custom S3 service instead of Amazon S3 services, please
add the options for the custom URL and region as documented below. Please read above if your desired combination has
been tested already and what other options have been set for a successful integration.

Lastly, go ahead and restart your Payara server. With Dataverse deployed and the site online, you should be able to upload datasets and data files and see the corresponding files in your S3 bucket. Within a bucket, the folder structure emulates that found in local file storage.

.. _list-of-s3-storage-options:

List of S3 Storage Options
##########################

Expand Down Expand Up @@ -1222,6 +1241,7 @@ List of S3 Storage Options
dataverse.files.<id>.payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
dataverse.files.<id>.chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
dataverse.files.<id>.connection-pool-size <?> The maximum number of open connections to the S3 server ``256``
dataverse.files.<id>.disable-tagging ``true``/``false`` Do not place the ``temp`` tag when redirecting the upload to the S3 server. ``false``
=========================================== ================== =================================================================================== =============

.. table::
Expand Down Expand Up @@ -4458,9 +4478,13 @@ A boolean setting that, if true, will send an email and notification to users wh
:CVocConf
+++++++++

A JSON-structured setting that configures Dataverse to associate specific metadatablock fields with external vocabulary services and specific vocabularies/sub-vocabularies managed by that service. More information about this capability is available at :doc:`/admin/metadatacustomization`.
The ``:CVocConf`` database setting is used to allow metadatablock fields to look up values in external vocabulary services. For example, you could configure the "Author Affiliation" field to look up organizations in the `Research Organization Registry (ROR) <https://ror.org>`_. For a high-level description of this feature, see :ref:`using-external-vocabulary-services` in the Admin Guide.

The expected format for the ``:CVocConf`` database setting is JSON but the details are not documented here. Instead, please refer to `docs/readme.md <https://github.com/gdcc/dataverse-external-vocab-support/blob/main/docs/readme.md>`_ in the https://github.com/gdcc/dataverse-external-vocab-support repo.

That repository also includes scripts that implement the lookup for specific service protocols, a JSON Schema for validating the structure required by this setting, and an example metadatablock with a sample ``:CVocConf`` config that associates fields in the example block with ORCID and SKOSMOS based services.

Scripts that implement this association for specific service protocols are maintained at https://github.com/gdcc/dataverse-external-vocab-support. That repository also includes a json-schema for validating the structure required by this setting along with an example metadatablock and sample :CVocConf setting values associating entries in the example block with ORCID and SKOSMOS based services.
The commands below should give you an idea of how to load the configuration, but you'll want to study the examples and make decisions about which configuration to use:

``wget https://gdcc.github.io/dataverse-external-vocab-support/examples/config/cvoc-conf.json``

Expand Down
2 changes: 1 addition & 1 deletion scripts/api/setup-all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ echo
echo "Setting up the admin user (and as superuser)"
adminResp=$(curl -s -H "Content-type:application/json" -X POST -d @"$SCRIPT_PATH"/data/user-admin.json "${DATAVERSE_URL}/api/builtin-users?password=$DV_SU_PASSWORD&key=burrito")
echo "$adminResp"
curl -X POST "${DATAVERSE_URL}/api/admin/superuser/dataverseAdmin"
curl -X PUT "${DATAVERSE_URL}/api/admin/superuser/dataverseAdmin" -d "true"
echo

echo "Setting up the root dataverse"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -505,7 +505,14 @@ public void process(HttpResponse response, HttpContext context) throws HttpExcep
HttpGet httpGet = new HttpGet(retrievalUri);
//application/json+ld is for backward compatibility
httpGet.addHeader("Accept", "application/ld+json, application/json+ld, application/json");

//Adding others custom HTTP request headers if exists
final JsonObject headers = cvocEntry.getJsonObject("headers");
if (headers != null) {
final Set<String> headerKeys = headers.keySet();
for (final String hKey: headerKeys) {
httpGet.addHeader(hKey, headers.getString(hKey));
}
}
HttpResponse response = httpClient.execute(httpGet);
String data = EntityUtils.toString(response.getEntity(), StandardCharsets.UTF_8);
int statusCode = response.getStatusLine().getStatusCode();
Expand Down
Loading

0 comments on commit 902dcb3

Please sign in to comment.