Skip to content

Commit

Permalink
Merge branch 'develop' into 3609-large-guestbooks
Browse files Browse the repository at this point in the history
  • Loading branch information
mheppler committed Aug 11, 2017
2 parents 9b3fff0 + dd55c08 commit 39fa4d5
Show file tree
Hide file tree
Showing 58 changed files with 2,387 additions and 1,136 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"status": "validation passed",
"uploadFolder": "DNXV2H",
"totalSize": 1234567890
}
64 changes: 64 additions & 0 deletions doc/sphinx-guides/source/developers/big-data-support.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
Big Data Support
================

Big data support is highly experimental. Eventually this content will move to the Installation Guide.

.. contents:: |toctitle|
:local:

Various components need to be installed and configured for big data support.

Data Capture Module (DCM)
-------------------------

Data Capture Module (DCM) is an experimental component that allows users to upload large datasets via rsync over ssh.

Install a DCM
~~~~~~~~~~~~~

Installation instructions can be found at https://github.com/sbgrid/data-capture-module . Note that a shared filesystem between Dataverse and your DCM is required. You cannot use a DCM with non-filesystem storage options such as Swift.

Once you have installed a DCM, you will need to configure two database settings on the Dataverse side. These settings are documented in the :doc:`/installation/config` section of the Installation Guide:

- ``:DataCaptureModuleUrl`` should be set to the URL of a DCM you installed.
- ``:UploadMethods`` should be set to ``dcm/rsync+ssh``.

This will allow your Dataverse installation to communicate with your DCM, so that Dataverse can download rsync scripts for your users.

Downloading rsync scripts via Dataverse API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The rsync script can be downloaded from Dataverse via API using an authorized API token. In the curl example below, substitute ``$PERSISTENT_ID`` with a DOI or Handle:

``curl -H "X-Dataverse-key: $API_TOKEN" $DV_BASE_URL/api/datasets/:persistentId/dataCaptureModule/rsync?persistentId=$PERSISTENT_ID``

How a DCM reports checksum success or failure to Dataverse
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once the user uploads files to a DCM, that DCM will perform checksum validation and report to Dataverse the results of that validation. The DCM must be configured to pass the API token of a superuser. The implementation details, which are subject to change, are below.

The JSON that a DCM sends to Dataverse on successful checksum validation looks something like the contents of :download:`checksumValidationSuccess.json <../_static/installation/files/root/big-data-support/checksumValidationSuccess.json>` below:

.. literalinclude:: ../_static/installation/files/root/big-data-support/checksumValidationSuccess.json
:language: json

- ``status`` - The valid strings to send are ``validation passed`` and ``validation failed``.
- ``uploadFolder`` - This is the directory on disk where Dataverse should attempt to find the files that a DCM has moved into place. There should always be a ``files.sha`` file and a least one data file. ``files.sha`` is a manifest of all the data files and their checksums. The ``uploadFolder`` directory is inside the directory where data is stored for the dataset and may have the same name as the "identifier" of the persistent id (DOI or Handle). For example, you would send ``"uploadFolder": "DNXV2H"`` in the JSON file when the absolute path to this directory is ``/usr/local/glassfish4/glassfish/domains/domain1/files/10.5072/FK2/DNXV2H/DNXV2H``.
- ``totalSize`` - Dataverse will use this value to represent the total size in bytes of all the files in the "package" that's created. If 360 data files and one ``files.sha`` manifest file are in the ``uploadFolder``, this value is the sum of the 360 data files.


Here's the syntax for sending the JSON.

``curl -H "X-Dataverse-key: $API_TOKEN" -X POST -H 'Content-type: application/json' --upload-file checksumValidationSuccess.json $DV_BASE_URL/api/datasets/:persistentId/dataCaptureModule/checksumValidation?persistentId=$PERSISTENT_ID``

Troubleshooting
~~~~~~~~~~~~~~~

The following low level command should only be used when troubleshooting the "import" code a DCM uses but is documented here for completeness.

``curl -H "X-Dataverse-key: $API_TOKEN" -X POST "$DV_BASE_URL/api/batch/jobs/import/datasets/files/$DATASET_DB_ID?uploadFolder=$UPLOAD_FOLDER&totalSize=$TOTAL_SIZE"``

Repository Storage Abstraction Layer (RSAL)
-------------------------------------------

For now, please see https://github.com/sbgrid/rsal
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/developers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@ Developer Guide
unf/index
geospatial
selinux
big-data-support
15 changes: 0 additions & 15 deletions doc/sphinx-guides/source/installation/data-capture-module.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/sphinx-guides/source/installation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,3 @@ Installation Guide
geoconnect
shibboleth
oauth2
data-capture-module
15 changes: 15 additions & 0 deletions scripts/database/upgrades/upgrade_v4.7.1_to_v4.8.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
-- Updates the database to add a storage identifier to each DvObject
ALTER TABLE dvobject ADD COLUMN storageidentifier character varying(255);

UPDATE dvobject
SET storageidentifier=(SELECT datafile.filesystemname
FROM datafile
WHERE datafile.id=dvobject.id AND dvobject.dtype='DataFile') where dvobject.dtype='DataFile';

UPDATE dvobject
SET storageidentifier=(select concat('file://',authority::text,ds.doiseparator::text,ds.identifier::text)
FROM dataset ds
WHERE dvobject.id=ds.id)
WHERE storageidentifier IS NULL;

ALTER TABLE datafile DROP COLUMN filesystemname;
6 changes: 3 additions & 3 deletions src/main/java/Bundle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -175,8 +175,8 @@ notification.access.granted.fileDownloader.additionalDataset={0} You now have ac
notification.access.revoked.dataverse=You have been removed from a role in {0}.
notification.access.revoked.dataset=You have been removed from a role in {0}.
notification.access.revoked.datafile=You have been removed from a role in {0}.
notification.checksumfail=Your upload to dataset "{0}" failed checksum validation.
notification.import.filesystem=<a href="{0}/dataset.xhtml?persistentId={1}" title="{2}"&>{2}</a>, dataset had files imported from the file system via a batch job.
notification.checksumfail=One or more files in your upload failed checksum validation for dataset {0}. Please re-run the upload script. If the problem persists, please contact support.
notification.import.filesystem=Dataset <a href="{0}/dataset.xhtml?persistentId={1}" title="{2}"&>{2}</a> has been successfully uploaded and verified.
notification.import.checksum=<a href="/dataset.xhtml?persistentId={0}" title="{1}"&>{1}</a>, dataset had file checksums added via a batch job.
removeNotification=Remove Notification
groupAndRoles.manageTips=Here is where you can access and manage all the groups you belong to, and the roles you have been assigned.
Expand Down Expand Up @@ -552,7 +552,7 @@ notification.email.welcome=Welcome to {0}! Get started by adding or finding data
notification.email.welcomeConfirmEmailAddOn=\n\nPlease verify your email address at {0}. Note, the verify link will expire after {1}. Send another verification email by visiting your account page.
notification.email.requestFileAccess=File access requested for dataset: {0}. Manage permissions at {1}.
notification.email.grantFileAccess=Access granted for files in dataset: {0} (view at {1}).
notification.email.rejectFileAccess=Access rejected for requested files in dataset: {0} (view at {1}).
notification.email.rejectFileAccess=Your request for access was rejected for the requested files in the dataset: {0} (view at {1}). If you have any questions about why your request was rejected, you may reach the dataset owner using the "Contact" link on the upper right corner of the dataset page.
# Bundle file editors, please note that "notification.email.createDataverse" is used in a unit test
notification.email.createDataverse=Your new dataverse named {0} (view at {1} ) was created in {2} (view at {3} ). To learn more about what you can do with your dataverse, check out the Dataverse Management - User Guide at {4}/{5}/user/dataverse-management.html .
# Bundle file editors, please note that "notification.email.createDataset" is used in a unit test
Expand Down
38 changes: 19 additions & 19 deletions src/main/java/edu/harvard/iq/dataverse/DataFile.java
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
import edu.harvard.iq.dataverse.api.WorldMapRelatedData;
import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser;
import edu.harvard.iq.dataverse.dataaccess.DataAccess;
import edu.harvard.iq.dataverse.dataaccess.DataFileIO;
import edu.harvard.iq.dataverse.dataaccess.StorageIO;
import edu.harvard.iq.dataverse.dataset.DatasetThumbnail;
import edu.harvard.iq.dataverse.ingest.IngestReport;
import edu.harvard.iq.dataverse.ingest.IngestRequest;
Expand Down Expand Up @@ -81,10 +81,10 @@ public class DataFile extends DvObject implements Comparable {
private String contentType;


@Expose
@SerializedName("storageIdentifier")
@Column( nullable = false )
private String fileSystemName;
// @Expose
// @SerializedName("storageIdentifier")
// @Column( nullable = false )
// private String fileSystemName;

/**
* End users will see "SHA-1" (with a hyphen) rather than "SHA1" in the GUI
Expand Down Expand Up @@ -257,7 +257,7 @@ public DataTable getDataTable() {

public void setDataTable(DataTable dt) {
if (this.getDataTables() == null) {
this.setDataTables( new ArrayList<>() );
this.setDataTables(new ArrayList<>());
} else {
this.getDataTables().clear();
}
Expand All @@ -278,7 +278,7 @@ public List<String> getTagLabels(){

for (DataFileTag element : currentDataTags) {
tagStrings.add(element.getTypeLabel());
}
}
}
return tagStrings;
}
Expand Down Expand Up @@ -437,13 +437,13 @@ public void setOwner(Dataset dataset) {
super.setOwner(dataset);
}

public String getStorageIdentifier() {
return this.fileSystemName;
}

public void setStorageIdentifier(String storageIdentifier) {
this.fileSystemName = storageIdentifier;
}
// public String getStorageIdentifier() {
// return this.fileSystemName;
// }
//
// public void setStorageIdentifier(String storageIdentifier) {
// this.fileSystemName = storageIdentifier;
// }

public String getDescription() {
FileMetadata fmd = getLatestFileMetadata();
Expand Down Expand Up @@ -555,14 +555,14 @@ public String getOriginalChecksumType() {
return BundleUtil.getStringFromBundle("file.originalChecksumType", Arrays.asList(this.checksumType.toString()) );
}

public DataFileIO getDataFileIO() throws IOException {
DataFileIO dataFileIO = DataAccess.getDataFileIO(this);
public StorageIO<DataFile> getStorageIO() throws IOException {
StorageIO<DataFile> storageIO = DataAccess.getStorageIO(this);

if (dataFileIO == null) {
throw new IOException("Failed to create DataFileIO for datafile.");
if (storageIO == null) {
throw new IOException("Failed to create storageIO for datafile.");
}

return dataFileIO;
return storageIO;
}

/*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -198,8 +198,8 @@ public List<DataFile> findAllRelatedByRootDatafileId(Long datafileId) {

public DataFile findByStorageIdandDatasetVersion(String storageId, DatasetVersion dv) {
try {
Query query = em.createNativeQuery("select o.id from datafile o, filemetadata m " +
"where o.filesystemname = '" + storageId + "' and o.id = m.datafile_id and m.datasetversion_id = " +
Query query = em.createNativeQuery("select o.id from dvobject o, filemetadata m " +
"where o.storageidentifier = '" + storageId + "' and o.id = m.datafile_id and m.datasetversion_id = " +
dv.getId() + "");
query.setMaxResults(1);
if (query.getResultList().size() < 1) {
Expand Down Expand Up @@ -308,7 +308,7 @@ public DataFile findCheapAndEasy(Long id) {
Object[] result;

try {
result = (Object[]) em.createNativeQuery("SELECT t0.ID, t0.CREATEDATE, t0.INDEXTIME, t0.MODIFICATIONTIME, t0.PERMISSIONINDEXTIME, t0.PERMISSIONMODIFICATIONTIME, t0.PUBLICATIONDATE, t0.CREATOR_ID, t0.RELEASEUSER_ID, t0.PREVIEWIMAGEAVAILABLE, t1.CONTENTTYPE, t1.FILESYSTEMNAME, t1.FILESIZE, t1.INGESTSTATUS, t1.CHECKSUMVALUE, t1.RESTRICTED, t3.ID, t3.AUTHORITY, t3.IDENTIFIER, t1.CHECKSUMTYPE, t1.PREVIOUSDATAFILEID, t1.ROOTDATAFILEID FROM DVOBJECT t0, DATAFILE t1, DVOBJECT t2, DATASET t3 WHERE ((t0.ID = " + id + ") AND (t0.OWNER_ID = t2.ID) AND (t2.ID = t3.ID) AND (t1.ID = t0.ID))").getSingleResult();
result = (Object[]) em.createNativeQuery("SELECT t0.ID, t0.CREATEDATE, t0.INDEXTIME, t0.MODIFICATIONTIME, t0.PERMISSIONINDEXTIME, t0.PERMISSIONMODIFICATIONTIME, t0.PUBLICATIONDATE, t0.CREATOR_ID, t0.RELEASEUSER_ID, t0.PREVIEWIMAGEAVAILABLE, t1.CONTENTTYPE, t0.STORAGEIDENTIFIER, t1.FILESIZE, t1.INGESTSTATUS, t1.CHECKSUMVALUE, t1.RESTRICTED, t3.ID, t3.AUTHORITY, t3.IDENTIFIER, t1.CHECKSUMTYPE, t1.PREVIOUSDATAFILEID, t1.ROOTDATAFILEID FROM DVOBJECT t0, DATAFILE t1, DVOBJECT t2, DATASET t3 WHERE ((t0.ID = " + id + ") AND (t0.OWNER_ID = t2.ID) AND (t2.ID = t3.ID) AND (t1.ID = t0.ID))").getSingleResult();
} catch (Exception ex) {
return null;
}
Expand Down Expand Up @@ -562,7 +562,7 @@ public void findFileMetadataOptimizedExperimental(Dataset owner, DatasetVersion

i = 0;

List<Object[]> fileResults = em.createNativeQuery("SELECT t0.ID, t0.CREATEDATE, t0.INDEXTIME, t0.MODIFICATIONTIME, t0.PERMISSIONINDEXTIME, t0.PERMISSIONMODIFICATIONTIME, t0.PUBLICATIONDATE, t0.CREATOR_ID, t0.RELEASEUSER_ID, t1.CONTENTTYPE, t1.FILESYSTEMNAME, t1.FILESIZE, t1.INGESTSTATUS, t1.CHECKSUMVALUE, t1.RESTRICTED, t1.CHECKSUMTYPE, t1.PREVIOUSDATAFILEID, t1.ROOTDATAFILEID FROM DVOBJECT t0, DATAFILE t1 WHERE ((t0.OWNER_ID = " + owner.getId() + ") AND ((t1.ID = t0.ID) AND (t0.DTYPE = 'DataFile'))) ORDER BY t0.ID").getResultList();
List<Object[]> fileResults = em.createNativeQuery("SELECT t0.ID, t0.CREATEDATE, t0.INDEXTIME, t0.MODIFICATIONTIME, t0.PERMISSIONINDEXTIME, t0.PERMISSIONMODIFICATIONTIME, t0.PUBLICATIONDATE, t0.CREATOR_ID, t0.RELEASEUSER_ID, t1.CONTENTTYPE, t0.STORAGEIDENTIFIER, t1.FILESIZE, t1.INGESTSTATUS, t1.CHECKSUMVALUE, t1.RESTRICTED, t1.CHECKSUMTYPE, t1.PREVIOUSDATAFILEID, t1.ROOTDATAFILEID FROM DVOBJECT t0, DATAFILE t1 WHERE ((t0.OWNER_ID = " + owner.getId() + ") AND ((t1.ID = t0.ID) AND (t0.DTYPE = 'DataFile'))) ORDER BY t0.ID").getResultList();

for (Object[] result : fileResults) {
Integer file_id = (Integer) result[0];
Expand Down
29 changes: 28 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/DatasetField.java
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,9 @@ public String getCompoundDisplayValue() {
return returnString;
}

/**
* despite the name, this returns a list of display values; not a list of values
*/
public List<String> getValues() {
List<String> returnList = new ArrayList<>();
if (!datasetFieldValues.isEmpty()) {
Expand All @@ -287,9 +290,33 @@ public List<String> getValues() {
}
return returnList;
}
/**
* list of values (as opposed to display values).
* used for passing to solr for indexing
*/
public List<String> getValues_nondisplay()
{
List returnList = new ArrayList();
if (!datasetFieldValues.isEmpty()) {
for (DatasetFieldValue dsfv : datasetFieldValues) {
returnList.add(dsfv.getValue());
}
} else {
for (ControlledVocabularyValue cvv : controlledVocabularyValues) {
if (cvv != null && cvv.getStrValue() != null) {
returnList.add(cvv.getStrValue());
}
}
}
return returnList;
}

/**
* appears to be only used for sending info to solr; changed to return values
* instead of display values
*/
public List<String> getValuesWithoutNaValues() {
List<String> returnList = getValues();
List<String> returnList = getValues_nondisplay();
returnList.removeAll(Arrays.asList(NA_VALUE));
return returnList;
}
Expand Down
6 changes: 3 additions & 3 deletions src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import edu.harvard.iq.dataverse.authorization.providers.builtin.BuiltinUserServiceBean;
import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser;
import edu.harvard.iq.dataverse.authorization.users.PrivateUrlUser;
import edu.harvard.iq.dataverse.dataaccess.DataFileIO;
import edu.harvard.iq.dataverse.dataaccess.StorageIO;
import edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter;
import edu.harvard.iq.dataverse.dataaccess.SwiftAccessIO;
import edu.harvard.iq.dataverse.dataset.DatasetThumbnail;
Expand Down Expand Up @@ -414,9 +414,9 @@ public String getSwiftContainerName(){

String swiftContainerName;
try {
DataFileIO dataFileIO = getInitialDataFile().getDataFileIO();
StorageIO<DataFile> storageIO = getInitialDataFile().getStorageIO();
try {
SwiftAccessIO swiftIO = (SwiftAccessIO)dataFileIO;
SwiftAccessIO<DataFile> swiftIO = (SwiftAccessIO<DataFile>) storageIO;
swiftIO.open();
swiftContainerName = swiftIO.getSwiftContainerName();
logger.info("Swift container name: " + swiftContainerName);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -820,7 +820,7 @@ public Dataset setNonDatasetFileAsThumbnail(Dataset dataset, InputStream inputSt
logger.fine("In setNonDatasetFileAsThumbnail but inputStream is null! Returning null.");
return null;
}
dataset = DatasetUtil.persistDatasetLogoToDiskAndCreateThumbnail(dataset, inputStream);
dataset = DatasetUtil.persistDatasetLogoToStorageAndCreateThumbnail(dataset, inputStream);
dataset.setThumbnailFile(null);
return merge(dataset);
}
Expand Down
Loading

0 comments on commit 39fa4d5

Please sign in to comment.