Skip to content

Commit

Permalink
Merge pull request #4034 from IQSS/3942-DCM-notifications
Browse files Browse the repository at this point in the history
3942 dcm notifications
  • Loading branch information
kcondon authored Aug 10, 2017
2 parents 8223647 + 103b40c commit dd55c08
Show file tree
Hide file tree
Showing 20 changed files with 662 additions and 300 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"status": "validation passed",
"uploadFolder": "DNXV2H",
"totalSize": 1234567890
}
64 changes: 64 additions & 0 deletions doc/sphinx-guides/source/developers/big-data-support.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
Big Data Support
================

Big data support is highly experimental. Eventually this content will move to the Installation Guide.

.. contents:: |toctitle|
:local:

Various components need to be installed and configured for big data support.

Data Capture Module (DCM)
-------------------------

Data Capture Module (DCM) is an experimental component that allows users to upload large datasets via rsync over ssh.

Install a DCM
~~~~~~~~~~~~~

Installation instructions can be found at https://github.com/sbgrid/data-capture-module . Note that a shared filesystem between Dataverse and your DCM is required. You cannot use a DCM with non-filesystem storage options such as Swift.

Once you have installed a DCM, you will need to configure two database settings on the Dataverse side. These settings are documented in the :doc:`/installation/config` section of the Installation Guide:

- ``:DataCaptureModuleUrl`` should be set to the URL of a DCM you installed.
- ``:UploadMethods`` should be set to ``dcm/rsync+ssh``.

This will allow your Dataverse installation to communicate with your DCM, so that Dataverse can download rsync scripts for your users.

Downloading rsync scripts via Dataverse API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The rsync script can be downloaded from Dataverse via API using an authorized API token. In the curl example below, substitute ``$PERSISTENT_ID`` with a DOI or Handle:

``curl -H "X-Dataverse-key: $API_TOKEN" $DV_BASE_URL/api/datasets/:persistentId/dataCaptureModule/rsync?persistentId=$PERSISTENT_ID``

How a DCM reports checksum success or failure to Dataverse
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once the user uploads files to a DCM, that DCM will perform checksum validation and report to Dataverse the results of that validation. The DCM must be configured to pass the API token of a superuser. The implementation details, which are subject to change, are below.

The JSON that a DCM sends to Dataverse on successful checksum validation looks something like the contents of :download:`checksumValidationSuccess.json <../_static/installation/files/root/big-data-support/checksumValidationSuccess.json>` below:

.. literalinclude:: ../_static/installation/files/root/big-data-support/checksumValidationSuccess.json
:language: json

- ``status`` - The valid strings to send are ``validation passed`` and ``validation failed``.
- ``uploadFolder`` - This is the directory on disk where Dataverse should attempt to find the files that a DCM has moved into place. There should always be a ``files.sha`` file and a least one data file. ``files.sha`` is a manifest of all the data files and their checksums. The ``uploadFolder`` directory is inside the directory where data is stored for the dataset and may have the same name as the "identifier" of the persistent id (DOI or Handle). For example, you would send ``"uploadFolder": "DNXV2H"`` in the JSON file when the absolute path to this directory is ``/usr/local/glassfish4/glassfish/domains/domain1/files/10.5072/FK2/DNXV2H/DNXV2H``.
- ``totalSize`` - Dataverse will use this value to represent the total size in bytes of all the files in the "package" that's created. If 360 data files and one ``files.sha`` manifest file are in the ``uploadFolder``, this value is the sum of the 360 data files.


Here's the syntax for sending the JSON.

``curl -H "X-Dataverse-key: $API_TOKEN" -X POST -H 'Content-type: application/json' --upload-file checksumValidationSuccess.json $DV_BASE_URL/api/datasets/:persistentId/dataCaptureModule/checksumValidation?persistentId=$PERSISTENT_ID``

Troubleshooting
~~~~~~~~~~~~~~~

The following low level command should only be used when troubleshooting the "import" code a DCM uses but is documented here for completeness.

``curl -H "X-Dataverse-key: $API_TOKEN" -X POST "$DV_BASE_URL/api/batch/jobs/import/datasets/files/$DATASET_DB_ID?uploadFolder=$UPLOAD_FOLDER&totalSize=$TOTAL_SIZE"``

Repository Storage Abstraction Layer (RSAL)
-------------------------------------------

For now, please see https://github.com/sbgrid/rsal
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/developers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@ Developer Guide
unf/index
geospatial
selinux
big-data-support
15 changes: 0 additions & 15 deletions doc/sphinx-guides/source/installation/data-capture-module.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/sphinx-guides/source/installation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,3 @@ Installation Guide
geoconnect
shibboleth
oauth2
data-capture-module
4 changes: 2 additions & 2 deletions src/main/java/Bundle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -175,8 +175,8 @@ notification.access.granted.fileDownloader.additionalDataset={0} You now have ac
notification.access.revoked.dataverse=You have been removed from a role in {0}.
notification.access.revoked.dataset=You have been removed from a role in {0}.
notification.access.revoked.datafile=You have been removed from a role in {0}.
notification.checksumfail=Your upload to dataset "{0}" failed checksum validation.
notification.import.filesystem=<a href="{0}/dataset.xhtml?persistentId={1}" title="{2}"&>{2}</a>, dataset had files imported from the file system via a batch job.
notification.checksumfail=One or more files in your upload failed checksum validation for dataset {0}. Please re-run the upload script. If the problem persists, please contact support.
notification.import.filesystem=Dataset <a href="{0}/dataset.xhtml?persistentId={1}" title="{2}"&>{2}</a> has been successfully uploaded and verified.
notification.import.checksum=<a href="/dataset.xhtml?persistentId={0}" title="{1}"&>{1}</a>, dataset had file checksums added via a batch job.
removeNotification=Remove Notification
groupAndRoles.manageTips=Here is where you can access and manage all the groups you belong to, and the roles you have been assigned.
Expand Down
10 changes: 5 additions & 5 deletions src/main/java/edu/harvard/iq/dataverse/MailServiceBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -471,11 +471,11 @@ public String getMessageTextBasedOnNotification(UserNotification userNotificatio
return messageText += accountCreatedMessage;

case CHECKSUMFAIL:
version = (DatasetVersion) targetObject;
dataset = (Dataset) targetObject;
String checksumFailMsg = BundleUtil.getStringFromBundle("notification.checksumfail", Arrays.asList(
version.getDataset().getGlobalId()
dataset.getGlobalId()
));
logger.info("checksumFailMsg: " + checksumFailMsg);
logger.fine("checksumFailMsg: " + checksumFailMsg);
return messageText += checksumFailMsg;

case FILESYSTEMIMPORT:
Expand All @@ -485,7 +485,7 @@ public String getMessageTextBasedOnNotification(UserNotification userNotificatio
version.getDataset().getGlobalId(),
version.getDataset().getDisplayName()
));
logger.info("fileImportMsg: " + fileImportMsg);
logger.fine("fileImportMsg: " + fileImportMsg);
return messageText += fileImportMsg;

case CHECKSUMIMPORT:
Expand All @@ -494,7 +494,7 @@ public String getMessageTextBasedOnNotification(UserNotification userNotificatio
version.getDataset().getGlobalId(),
version.getDataset().getDisplayName()
));
logger.info("checksumImportMsg: " + checksumImportMsg);
logger.fine("checksumImportMsg: " + checksumImportMsg);
return messageText += checksumImportMsg;

}
Expand Down
12 changes: 12 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/PermissionServiceBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
import static edu.harvard.iq.dataverse.engine.command.CommandHelper.CH;
import edu.harvard.iq.dataverse.engine.command.DataverseRequest;
import java.util.Arrays;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.logging.Level;
import java.util.stream.Collectors;
Expand Down Expand Up @@ -418,6 +419,17 @@ public List<AuthenticatedUser> getUsersWithPermissionOn(Permission permission, D
}

return usersHasPermissionOn;
}

public Map<String, AuthenticatedUser> getDistinctUsersWithPermissionOn(Permission permission, DvObject dvo) {

List<AuthenticatedUser> users = getUsersWithPermissionOn(permission, dvo);
Map<String, AuthenticatedUser> distinctUsers = new HashMap<>();
users.forEach((au) -> {
distinctUsers.put(au.getIdentifier(), au);
});

return distinctUsers;
}

public List<Long> getDvObjectsUserHasRoleOn(User user) {
Expand Down
37 changes: 36 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/api/AbstractApiBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
import edu.harvard.iq.dataverse.authorization.users.PrivateUrlUser;
import edu.harvard.iq.dataverse.authorization.users.User;
import edu.harvard.iq.dataverse.confirmemail.ConfirmEmailServiceBean;
import edu.harvard.iq.dataverse.datacapturemodule.DataCaptureModuleServiceBean;
import edu.harvard.iq.dataverse.engine.command.Command;
import edu.harvard.iq.dataverse.engine.command.DataverseRequest;
import edu.harvard.iq.dataverse.engine.command.exception.CommandException;
Expand All @@ -34,12 +35,14 @@
import edu.harvard.iq.dataverse.privateurl.PrivateUrlServiceBean;
import edu.harvard.iq.dataverse.search.savedsearch.SavedSearchServiceBean;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
import edu.harvard.iq.dataverse.util.BundleUtil;
import edu.harvard.iq.dataverse.util.SystemConfig;
import edu.harvard.iq.dataverse.util.json.JsonParser;
import edu.harvard.iq.dataverse.util.json.NullSafeJsonBuilder;
import edu.harvard.iq.dataverse.validation.BeanValidationServiceBean;
import java.io.StringReader;
import java.net.URI;
import java.util.Collections;
import java.util.UUID;
import java.util.concurrent.Callable;
import java.util.logging.Level;
Expand Down Expand Up @@ -70,6 +73,7 @@ public abstract class AbstractApiBean {

private static final Logger logger = Logger.getLogger(AbstractApiBean.class.getName());
private static final String DATAVERSE_KEY_HEADER_NAME = "X-Dataverse-key";
private static final String PERSISTENT_ID_KEY=":persistentId";
public static final String STATUS_ERROR = "ERROR";
public static final String STATUS_OK = "OK";

Expand Down Expand Up @@ -199,6 +203,9 @@ String getWrappedMessageWhenJson() {
@EJB
protected SystemConfig systemConfig;

@EJB
protected DataCaptureModuleServiceBean dataCaptureModuleSvc;

@PersistenceContext(unitName = "VDCNet-ejbPU")
protected EntityManager em;

Expand Down Expand Up @@ -329,7 +336,6 @@ protected AuthenticatedUser findAuthenticatedUserOrDie() throws WrappedResponse
private AuthenticatedUser findAuthenticatedUserOrDie( String key ) throws WrappedResponse {
AuthenticatedUser authUser = authSvc.lookupUser(key);
if ( authUser != null ) {
System.out.println("Updating lastApiUseTime for authenticated user via abstractapibean");
authUser = userSvc.updateLastApiUseTime(authUser);

return authUser;
Expand All @@ -344,6 +350,35 @@ protected Dataverse findDataverseOrDie( String dvIdtf ) throws WrappedResponse {
}
return dv;
}


protected Dataset findDatasetOrDie(String id) throws WrappedResponse {
Dataset dataset;
if (id.equals(PERSISTENT_ID_KEY)) {
String persistentId = getRequestParameter(PERSISTENT_ID_KEY.substring(1));
if (persistentId == null) {
throw new WrappedResponse(
badRequest(BundleUtil.getStringFromBundle("find.dataset.error.dataset_id_is_null", Collections.singletonList(PERSISTENT_ID_KEY.substring(1)))));
}
dataset = datasetSvc.findByGlobalId(persistentId);
if (dataset == null) {
throw new WrappedResponse(notFound(BundleUtil.getStringFromBundle("find.dataset.error.dataset.not.found.persistentId", Collections.singletonList(persistentId))));
}
return dataset;

} else {
try {
dataset = datasetSvc.find(Long.parseLong(id));
if (dataset == null) {
throw new WrappedResponse(notFound(BundleUtil.getStringFromBundle("find.dataset.error.dataset.not.found.id", Collections.singletonList(id))));
}
return dataset;
} catch (NumberFormatException nfe) {
throw new WrappedResponse(
badRequest(BundleUtil.getStringFromBundle("find.dataset.error.dataset.not.found.bad.id", Collections.singletonList(id))));
}
}
}

protected DataverseRequest createDataverseRequest( User u ) {
return new DataverseRequest(u, httpRequest);
Expand Down
Loading

0 comments on commit dd55c08

Please sign in to comment.