Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sword Auth: use Dataverse 4 permission model #1070 #2495 #784 #3137

Merged
merged 10 commits into from
Jul 21, 2016
59 changes: 33 additions & 26 deletions doc/sphinx-guides/source/api/sword.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ SWORD API

SWORD_ stands for "Simple Web-service Offering Repository Deposit" and is a "profile" of AtomPub (`RFC 5023`_) which is a RESTful API that allows non-Dataverse software to deposit files and metadata into a Dataverse installation. :ref:`client-libraries` are available in Python, Java, R, Ruby, and PHP.

Introduced in Dataverse Network (DVN) `3.6 <http://guides.dataverse.org/en/3.6.2/dataverse-api-main.html#data-deposit-api>`_, the SWORD API was formerly known as the "Data Deposit API" and ``data-deposit/v1`` appeared in the URLs. For backwards compatibility these URLs will continue to work (with deprecation warnings). Due to architectural changes and security improvements (especially the introduction of API tokens) in Dataverse 4.0, a few backward incompatible changes were necessarily introduced and for this reason the version has been increased to ``v1.1``. For details, see :ref:`incompatible`.
Introduced in Dataverse Network (DVN) `3.6 <http://guides.dataverse.org/en/3.6.2/dataverse-api-main.html#data-deposit-api>`_, the SWORD API was formerly known as the "Data Deposit API" and ``data-deposit/v1`` appeared in the URLs. For backwards compatibility these URLs continue to work (with deprecation warnings). Due to architectural changes and security improvements (especially the introduction of API tokens) in Dataverse 4.0, a few backward incompatible changes were necessarily introduced and for this reason the version has been increased to ``v1.1``. For details, see :ref:`incompatible`.

Dataverse implements most of SWORDv2_, which is specified at http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html . Please reference the `SWORDv2 specification`_ for expected HTTP status codes (i.e. 201, 204, 404, etc.), headers (i.e. "Location"), etc. For a quick introduction to SWORD, the two minute video at http://cottagelabs.com/news/intro-to-sword-2 is recommended.

As a profile of AtomPub, XML is used throughout SWORD. As of Dataverse 4.0 datasets can also be created via JSON using the "native" API.
As a profile of AtomPub, XML is used throughout SWORD. As of Dataverse 4.0 datasets can also be created via JSON using the "native" API. SWORD is limited to the dozen or so fields listed below in the crosswalk, but the native API allows you to populate all metadata fields available in Dataverse.

.. _SWORD: http://en.wikipedia.org/wiki/SWORD_%28protocol%29

Expand All @@ -24,9 +24,9 @@ As a profile of AtomPub, XML is used throughout SWORD. As of Dataverse 4.0 datas
Backward incompatible changes
-----------------------------

For better security, usernames and passwords are no longer accepted. The use of an API token is required.
For better security than in DVN 3.x, usernames and passwords are no longer accepted. The use of an API token is required.

In addition, differences in Dataverse 4.0 have lead to a few minor backward incompatible changes in the Dataverse implementation of SWORD, which are listed below. Old ``v1`` URLs should continue to work but the ``Service Document`` will contain a deprecation warning and responses will contain ``v1.1`` URLs. See also :ref:`known-issues`.
Differences in Dataverse 4 from DVN 3.x lead to a few minor backward incompatible changes in the Dataverse implementation of SWORD, which are listed below. Old ``v1`` URLs should continue to work but the ``Service Document`` will contain a deprecation warning and responses will contain ``v1.1`` URLs. See also :ref:`known-issues`.

- Newly required fields when creating/editing datasets for compliance with the `Joint Declaration for Data Citation principles <http://thedata.org/blog/joint-declaration-data-citation-principles-and-dataverse>`_.

Expand All @@ -41,11 +41,13 @@ In addition, differences in Dataverse 4.0 have lead to a few minor backward inco
New features as of v1.1
-----------------------

- Dataverse 4.0 supports API tokens and they must be used rather that a username and password. In the ``curl`` examples below, you will see ``curl -u $API_TOKEN:`` showing that you should send your API token as the username and nothing as the password. For example, ``curl -u 54b143b5-d001-4254-afc0-a1c0f6a5b5a7:``.
- Dataverse 4 supports API tokens and they must be used rather that a username and password. In the ``curl`` examples below, you will see ``curl -u $API_TOKEN:`` showing that you should send your API token as the username and nothing as the password. For example, ``curl -u 54b143b5-d001-4254-afc0-a1c0f6a5b5a7:``.

- Dataverses can be published via SWORD
- SWORD operations no longer require "admin" permission. In order to use any SWORD operation in DVN 3.x, you had to be "admin" on a dataverse (the container for your dataset) and similar rules were applied in Dataverse 4.4 and earlier (the ``EditDataverse`` permission was required). The SWORD API has now been fully integrated with the Dataverse 4 permission model such that any action you have permission to perform in the GUI or "native" API you are able to perform via SWORD. This means that even a user with a "Contributor" role can operate on datasets via SWORD. Note that users with the "Contributor" role do not have the ``PublishDataset`` permission and will not be able publish their datasets via any mechanism, GUI or API.

- Datasets versions will only be increased to the next minor version (i.e. 1.1) rather than a major version (2.0) if possible. This depends on the nature of the change.
- Dataverses can be published via SWORD.

- Datasets versions will only be increased to the next minor version (i.e. 1.1) rather than a major version (2.0) if possible. This depends on the nature of the change. Adding or removing, a file, for example, requires a major version bump.

- "Author Affiliation" can now be populated with an XML attribute. For example: <dcterms:creator affiliation="Coffee Bean State University">Stumptown, Jane</dcterms:creator>

Expand All @@ -67,21 +69,23 @@ curl examples
Retrieve SWORD service document
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The service document enumerates the dataverses ("collections" from a SWORD perspective) the user can deposit data into. The "collectionPolicy" element for each dataverse contains the Terms of Use.
The service document enumerates the dataverses ("collections" from a SWORD perspective) the user can deposit data into. The "collectionPolicy" element for each dataverse contains the Terms of Use. Any user with an API token can use this API endpoint. Institution-wide Shibboleth groups are not respected because membership in such a group can only be set via a browser.

``curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/service-document``

Create a dataset with an Atom entry
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To create a dataset, you must have the "Dataset Creator" role (the ``AddDataset`` permission) on a dataverse. Practically speaking, you should first retrieve the service document to list the dataverses into which you are authorized to deposit data.

``curl -u $API_TOKEN: --data-binary "@path/to/atom-entry-study.xml" -H "Content-Type: application/atom+xml" https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/collection/dataverse/$DATAVERSE_ALIAS``

Example Atom entry (XML)

.. literalinclude:: sword-atom-entry.xml

Dublin Core Terms (DC Terms) Qualified Mapping - Dataverse DB Element Crosswalk
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+-----------------------------+----------------------------------------------+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
|DC (terms: namespace) | Dataverse DB Element | Required | Note |
+=============================+==============================================+==============+=============================================================================================================================================================+
Expand Down Expand Up @@ -117,62 +121,72 @@ Dublin Core Terms (DC Terms) Qualified Mapping - Dataverse DB Element Crosswalk
List datasets in a dataverse
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You must have permission to add datasets in a dataverse (the dataverse should appear in the service document) to list the datasets inside. Institution-wide Shibboleth groups are not respected because membership in such a group can only be set via a browser.

``curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/collection/dataverse/$DATAVERSE_ALIAS``

Add files to a dataset with a zip file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You must have ``EditDataset`` permission (Contributor role or above such as Curator or Admin) on the dataset to add files.

``curl -u $API_TOKEN: --data-binary @path/to/example.zip -H "Content-Disposition: filename=example.zip" -H "Content-Type: application/zip" -H "Packaging: http://purl.org/net/sword/package/SimpleZip" https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit-media/study/doi:TEST/12345``

Display a dataset atom entry
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You must have ``ViewUnpublishedDataset`` permission (Contributor role or above such as Curator or Admin) on the dataset to view its Atom entry.

Contains data citation (bibliographicCitation), alternate URI (persistent URI of study), edit URI, edit media URI, statement URI.

``curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:TEST/12345``

Display a dataset statement
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Contains title, author, feed of file entries, latestVersionState, locked boolean, updated timestamp.
Contains title, author, feed of file entries, latestVersionState, locked boolean, updated timestamp. You must have ``ViewUnpublishedDataset`` permission (Contributor role or above such as Curator or Admin) on the dataset to display the statement.

``curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/statement/study/doi:TEST/12345``

Delete a file by database id
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You must have ``EditDataset`` permission (Contributor role or above such as Curator or Admin) on the dataset to delete files.

``curl -u $API_TOKEN: -X DELETE https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit-media/file/123``

Replacing metadata for a dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Please note that **ALL** metadata (title, author, etc.) will be replaced, including fields that can not be expressed with "dcterms" fields.
Please note that **ALL** metadata (title, author, etc.) will be replaced, including fields that can not be expressed with "dcterms" fields. You must have ``EditDataset`` permission (Contributor role or above such as Curator or Admin) on the dataset to replace metadata.

``curl -u $API_TOKEN: --upload-file "path/to/atom-entry-study2.xml" -H "Content-Type: application/atom+xml" https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:TEST/12345``

Delete a dataset
~~~~~~~~~~~~~~~~

You must have the ``DeleteDatasetDraft`` permission (Contributor role or above such as Curator or Admin) on the dataset to delete it. Please note that if the dataset has never been published you will be able to delete it completely but if the dataset has already been published you will only be able to delete post-publication drafts, never a published version.

``curl -u $API_TOKEN: -i -X DELETE https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:TEST/12345``

Determine if a dataverse has been published
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Look for a `dataverseHasBeenReleased` boolean.
This API endpoint is the same as the "list datasets in a dataverse" endpoint documented above and the same permissions apply but it is documented here separately to point out that you can look for a boolean called ``dataverseHasBeenReleased`` to know if a dataverse has been released, which is required for publishing a dataset.

``curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/collection/dataverse/$DATAVERSE_ALIAS``

Publish a dataverse
~~~~~~~~~~~~~~~~~~~

The ``cat /dev/null`` and ``--data-binary @-`` arguments are used to send zero-length content to the API, which is required by the upstream library to process the ``In-Progress: false`` header.
The ``cat /dev/null`` and ``--data-binary @-`` arguments are used to send zero-length content to the API, which is required by the upstream library to process the ``In-Progress: false`` header. You must have the ``PublishDataverse`` permission (Admin role) on the dataverse to publish it.

``cat /dev/null | curl -u $API_TOKEN: -X POST -H "In-Progress: false" --data-binary @- https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/dataverse/$DATAVERSE_ALIAS``

Publish a dataset
~~~~~~~~~~~~~~~~~

The ``cat /dev/null`` and ``--data-binary @-`` arguments are used to send zero-length content to the API, which is required by the upstream library to process the ``In-Progress: false`` header.
The ``cat /dev/null`` and ``--data-binary @-`` arguments are used to send zero-length content to the API, which is required by the upstream library to process the ``In-Progress: false`` header. You must have the ``PublishDataset`` permission (Curator or Admin role) on the dataset to publish it.

``cat /dev/null | curl -u $API_TOKEN: -X POST -H "In-Progress: false" --data-binary @- https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:TEST/12345``

Expand All @@ -181,22 +195,15 @@ The ``cat /dev/null`` and ``--data-binary @-`` arguments are used to send zero-l
Known issues
------------

- Potential mismatch between the dataverses ("collections" from a SWORD perspective) the user can deposit data into in returned by the Service Document and which dataverses the user can actually deposit data into. This is due to an incomplete transition from the old DVN 3.x "admin-only" style permission checking to the new permissions system in Dataverse 4.0 ( https://github.com/IQSS/dataverse/issues/1070 ). The mismatch was reported at https://github.com/IQSS/dataverse/issues/1443

- Should see all the fields filled in for a dataset regardless of what the parent dataverse specifies: https://github.com/IQSS/dataverse/issues/756

- Inefficiency in constructing the ``Service Document``: https://github.com/IQSS/dataverse/issues/784

- Inefficiency in constructing the list of datasets: https://github.com/IQSS/dataverse/issues/784
- Deleting a file from a published version (not a draft) creates a draft but doesn't delete the file: https://github.com/IQSS/dataverse/issues/2464

Roadmap
-------
- The Service Document does not honor groups within groups: https://github.com/IQSS/dataverse/issues/3056

These are features we'd like to add in the future:
- Should see all the fields filled in for a dataset regardless of what the parent dataverse specifies: https://github.com/IQSS/dataverse/issues/756

- Implement SWORD 2.0 Profile 6.4: https://github.com/IQSS/dataverse/issues/183
- SWORD 2.0 Profile 6.4 "Retrieving the content" has not been implemented: https://github.com/IQSS/dataverse/issues/183

- Support deaccessioning via API: https://github.com/IQSS/dataverse/issues/778
- Deaccessioning via API is not supported (it was in DVN 3.x): https://github.com/IQSS/dataverse/issues/778

- Let file metadata (i.e. description) be specified during zip upload: https://github.com/IQSS/dataverse/issues/723

Expand Down
4 changes: 3 additions & 1 deletion doc/sphinx-guides/source/installation/shibboleth.rst
Original file line number Diff line number Diff line change
Expand Up @@ -304,12 +304,14 @@ To create an institution-wide Shibboleth groups, create a JSON file as below and

.. literalinclude:: ../_static/installation/files/etc/shibboleth/shibGroupTestShib.json

Note that institution-wide Shibboleth groups are based on the "Shib-Identity-Provider" attribute but https://github.com/IQSS/dataverse/issues/1515 tracks adding support for arbitrary attributes such as ""eduPersonScopedAffiliation", etc.
Institution-wide Shibboleth groups are based on the "Shib-Identity-Provider" SAML attribute asserted at runtime after successful authentication with the Identity Provider (IdP) and held within the browser session rather than being persisted in the database for any length of time. It is for this reason that roles based on these groups, such as the ability to create a dataset, are not honored by non-browser interactions, such as through the SWORD API.

To list institution-wide Shibboleth groups: ``curl http://localhost:8080/api/admin/groups/shib``

To delete an institution-wide Shibboleth group (assuming id 1): ``curl -X DELETE http://localhost:8080/api/admin/groups/shib/1``

Support for arbitrary attributes beyond "Shib-Identity-Provider" such as "eduPersonScopedAffiliation", etc. is being tracked at https://github.com/IQSS/dataverse/issues/1515

Converting Local Users to Shibboleth
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
2 changes: 1 addition & 1 deletion scripts/search/tests/grant-authusers-add-on-root
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/sh
. scripts/search/export-keys
OUTPUT=`curl -s -X POST -H "Content-type:application/json" -d "{\"assignee\": \":authenticated-users\",\"role\": \"dvContributor\"}" "http://localhost:8080/api/dataverses/root/assignments?key=$ADMINKEY"`
OUTPUT=`curl -s -X POST -H "Content-type:application/json" -d "{\"assignee\": \":authenticated-users\",\"role\": \"fullContributor\"}" "http://localhost:8080/api/dataverses/root/assignments?key=$ADMINKEY"`
echo $OUTPUT
echo $OUTPUT | jq ' .data | {assignee,_roleAlias}'
14 changes: 11 additions & 3 deletions src/main/java/edu/harvard/iq/dataverse/PermissionServiceBean.java
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
package edu.harvard.iq.dataverse;

import edu.harvard.iq.dataverse.api.datadeposit.SwordAuth;
import edu.harvard.iq.dataverse.authorization.AuthenticationServiceBean;
import edu.harvard.iq.dataverse.authorization.DataverseRole;
import edu.harvard.iq.dataverse.authorization.providers.builtin.BuiltinUserServiceBean;
Expand All @@ -8,6 +9,8 @@
import edu.harvard.iq.dataverse.authorization.RoleAssignee;
import edu.harvard.iq.dataverse.authorization.groups.Group;
import edu.harvard.iq.dataverse.authorization.groups.GroupServiceBean;
import edu.harvard.iq.dataverse.authorization.groups.GroupUtil;
import edu.harvard.iq.dataverse.authorization.groups.impl.builtin.AuthenticatedUsers;
import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser;
import edu.harvard.iq.dataverse.authorization.users.User;
import edu.harvard.iq.dataverse.engine.command.Command;
Expand Down Expand Up @@ -375,11 +378,16 @@ public RequestPermissionQuery request( DataverseRequest req ) {
* @param permission
* @return The list of dataverses {@code user} has permission {@code permission} on.
*/
public List<Dataverse> getDataversesUserHasPermissionOn(User user, Permission permission) {
public List<Dataverse> getDataversesUserHasPermissionOn(AuthenticatedUser user, Permission permission) {
Set<Group> groups = groupService.groupsFor(user);
String identifiers = GroupUtil.getAllIdentifiersForUser(user, groups);
/**
* @todo What about groups? And how can we make this more performant?
* @todo Are there any strings in identifiers that would break this SQL
* query?
*/
Query nativeQuery = em.createNativeQuery("SELECT id FROM dvobject WHERE dtype = 'Dataverse' and id in (select definitionpoint_id from roleassignment where assigneeidentifier in ('" + user.getIdentifier() + "'));");
String query = "SELECT id FROM dvobject WHERE dtype = 'Dataverse' and id in (select definitionpoint_id from roleassignment where assigneeidentifier in (" + identifiers + "));";
logger.fine("query: " + query);
Query nativeQuery = em.createNativeQuery(query);
List<Integer> dataverseIdsToCheck = nativeQuery.getResultList();
List<Dataverse> dataversesUserHasPermissionOn = new LinkedList<>();
for (int dvIdAsInt : dataverseIdsToCheck) {
Expand Down
Loading