Skip to content

Commit 1e52809

Browse files
authored
Merge pull request #8751 from GlobalDataverseCommunityConsortium/GDCC/8749-S3Archiver
GDCC/8749 S3 Archiver
2 parents ac49488 + 7d239ba commit 1e52809

File tree

8 files changed

+337
-31
lines changed

8 files changed

+337
-31
lines changed

doc/sphinx-guides/source/installation/config.rst

+35-2
Original file line numberDiff line numberDiff line change
@@ -1081,7 +1081,9 @@ These archival Bags include all of the files and metadata in a given dataset ver
10811081

10821082
The Dataverse Software offers an internal archive workflow which may be configured as a PostPublication workflow via an admin API call to manually submit previously published Datasets and prior versions to a configured archive such as Chronopolis. The workflow creates a `JSON-LD <http://www.openarchives.org/ore/0.9/jsonld>`_ serialized `OAI-ORE <https://www.openarchives.org/ore/>`_ map file, which is also available as a metadata export format in the Dataverse Software web interface.
10831083

1084-
At present, the DPNSubmitToArchiveCommand, LocalSubmitToArchiveCommand, and GoogleCloudSubmitToArchive are the only implementations extending the AbstractSubmitToArchiveCommand and using the configurable mechanisms discussed below.
1084+
At present, archiving classes include the DuraCloudSubmitToArchiveCommand, LocalSubmitToArchiveCommand, GoogleCloudSubmitToArchive, and S3SubmitToArchiveCommand , which all extend the AbstractSubmitToArchiveCommand and use the configurable mechanisms discussed below.
1085+
1086+
All current options support the archival status APIs and the same status is available in the dataset page version table (for contributors/those who could view the unpublished dataset, with more detail available to superusers).
10851087

10861088
.. _Duracloud Configuration:
10871089

@@ -1144,7 +1146,7 @@ ArchiverClassName - the fully qualified class to be used for archiving. For exam
11441146
Google Cloud Configuration
11451147
++++++++++++++++++++++++++
11461148

1147-
The Google Cloud Archiver can send archival Bags to a bucket in Google's cloud, including those in the 'Coldline' storage class (cheaper, with slower access)
1149+
The Google Cloud Archiver can send Dataverse Archival Bags to a bucket in Google's cloud, including those in the 'Coldline' storage class (cheaper, with slower access)
11481150

11491151
``curl http://localhost:8080/api/admin/settings/:ArchiverClassName -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.GoogleCloudSubmitToArchiveCommand"``
11501152

@@ -1168,6 +1170,31 @@ For example:
11681170

11691171
``cp <your key file> /usr/local/payara5/glassfish/domains/domain1/files/googlecloudkey.json``
11701172

1173+
.. _S3 Archiver Configuration:
1174+
1175+
S3 Configuration
1176+
++++++++++++++++
1177+
1178+
The S3 Archiver can send Dataverse Archival Bag to a bucket at any S3 endpoint. The configuration for the S3 Archiver is independent of any S3 store that may be configured in Dataverse and may, for example, leverage colder (cheaper, slower access) storage.
1179+
1180+
``curl http://localhost:8080/api/admin/settings/:ArchiverClassName -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.S3SubmitToArchiveCommand"``
1181+
1182+
``curl http://localhost:8080/api/admin/settings/:ArchiverSettings -X PUT -d ":S3ArchiverConfig, :BagGeneratorThreads"``
1183+
1184+
The S3 Archiver defines one custom setting, a required :S3ArchiverConfig. It can also use the :BagGeneratorThreads setting as described in the DuraCloud Configuration section above.
1185+
1186+
The credentials for your S3 account, can be stored in a profile in a standard credentials file (e.g. ~/.aws/credentials) referenced via "profile" key in the :S3ArchiverConfig setting (will default to the default entry), or can via MicroProfile settings as described for S3 stores (dataverse.s3archiver.access-key and dataverse.s3archiver.secret-key)
1187+
1188+
The :S3ArchiverConfig setting is a json object that must include an "s3_bucket_name" and may include additional S3-related parameters as described for S3 Stores, including "profile", "connection-pool-size","custom-endpoint-url", "custom-endpoint-region", "path-style-access", "payload-signing", and "chunked-encoding".
1189+
1190+
\:S3ArchiverConfig - minimally includes the name of the bucket to use. For example:
1191+
1192+
``curl http://localhost:8080/api/admin/settings/:S3ArchiverConfig -X PUT -d '{"s3_bucket_name":"archival-bucket"}'``
1193+
1194+
\:S3ArchiverConfig - example to also set the name of an S3 profile to use. For example:
1195+
1196+
``curl http://localhost:8080/api/admin/settings/:S3ArchiverConfig -X PUT -d '{"s3_bucket_name":"archival-bucket", "profile":"archiver"}'``
1197+
11711198
.. _Archiving API Call:
11721199

11731200
API Calls
@@ -2665,6 +2692,12 @@ This is the local file system path to be used with the LocalSubmitToArchiveComma
26652692

26662693
These are the bucket and project names to be used with the GoogleCloudSubmitToArchiveCommand class. Further information is in the :ref:`Google Cloud Configuration` section above.
26672694

2695+
:S3ArchiverConfig
2696+
+++++++++++++++++
2697+
2698+
This is the JSON configuration object setting to be used with the S3SubmitToArchiveCommand class. Further information is in the :ref:`S3 Archiver Configuration` section above.
2699+
2700+
26682701
.. _:InstallationName:
26692702

26702703
:InstallationName

src/main/java/edu/harvard/iq/dataverse/DatasetPage.java

+3-8
Original file line numberDiff line numberDiff line change
@@ -5599,7 +5599,7 @@ public boolean isArchivable() {
55995599
archivable = ((Boolean) m.invoke(null, params) == true);
56005600
} catch (ClassNotFoundException | IllegalAccessException | IllegalArgumentException
56015601
| InvocationTargetException | NoSuchMethodException | SecurityException e) {
5602-
logger.warning("Failed to call is Archivable on configured archiver class: " + className);
5602+
logger.warning("Failed to call isArchivable on configured archiver class: " + className);
56035603
e.printStackTrace();
56045604
}
56055605
}
@@ -5635,7 +5635,7 @@ public boolean isVersionArchivable() {
56355635
}
56365636
} catch (ClassNotFoundException | IllegalAccessException | IllegalArgumentException
56375637
| InvocationTargetException | NoSuchMethodException | SecurityException e) {
5638-
logger.warning("Failed to call is Archivable on configured archiver class: " + className);
5638+
logger.warning("Failed to call isSingleVersion on configured archiver class: " + className);
56395639
e.printStackTrace();
56405640
}
56415641
}
@@ -5646,12 +5646,7 @@ public boolean isVersionArchivable() {
56465646

56475647
public boolean isSomeVersionArchived() {
56485648
if (someVersionArchived == null) {
5649-
someVersionArchived = false;
5650-
for (DatasetVersion dv : dataset.getVersions()) {
5651-
if (dv.getArchivalCopyLocation() != null) {
5652-
someVersionArchived = true;
5653-
}
5654-
}
5649+
someVersionArchived = ArchiverUtil.isSomeVersionArchived(dataset);
56555650
}
56565651
return someVersionArchived;
56575652
}

src/main/java/edu/harvard/iq/dataverse/engine/command/impl/AbstractSubmitToArchiveCommand.java

+10-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
package edu.harvard.iq.dataverse.engine.command.impl;
22

3+
import edu.harvard.iq.dataverse.DOIDataCiteRegisterService;
4+
import edu.harvard.iq.dataverse.DataCitation;
35
import edu.harvard.iq.dataverse.Dataset;
46
import edu.harvard.iq.dataverse.DatasetVersion;
57
import edu.harvard.iq.dataverse.DvObject;
@@ -94,6 +96,13 @@ public String describe() {
9496
return super.describe() + "DatasetVersion: [" + version.getId() + " (v"
9597
+ version.getFriendlyVersionNumber()+")]";
9698
}
99+
100+
String getDataCiteXml(DatasetVersion dv) {
101+
DataCitation dc = new DataCitation(dv);
102+
Map<String, String> metadata = dc.getDataCiteMetadata();
103+
return DOIDataCiteRegisterService.getMetadataFromDvObject(dv.getDataset().getGlobalId().asString(), metadata,
104+
dv.getDataset());
105+
}
97106

98107
public Thread startBagThread(DatasetVersion dv, PipedInputStream in, DigestInputStream digestInputStream2,
99108
String dataciteXml, ApiToken token) throws IOException, InterruptedException {
@@ -160,7 +169,7 @@ public void run() {
160169
}
161170
return bagThread;
162171
}
163-
172+
164173
public static boolean isArchivable(Dataset dataset, SettingsWrapper settingsWrapper) {
165174
return true;
166175
}

src/main/java/edu/harvard/iq/dataverse/engine/command/impl/DuraCloudSubmitToArchiveCommand.java

+1-6
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
package edu.harvard.iq.dataverse.engine.command.impl;
22

3-
import edu.harvard.iq.dataverse.DOIDataCiteRegisterService;
4-
import edu.harvard.iq.dataverse.DataCitation;
53
import edu.harvard.iq.dataverse.Dataset;
64
import edu.harvard.iq.dataverse.DatasetVersion;
75
import edu.harvard.iq.dataverse.DatasetLock.Reason;
@@ -108,10 +106,7 @@ public WorkflowStepResult performArchiveSubmission(DatasetVersion dv, ApiToken t
108106
if (!store.spaceExists(spaceName)) {
109107
store.createSpace(spaceName);
110108
}
111-
DataCitation dc = new DataCitation(dv);
112-
Map<String, String> metadata = dc.getDataCiteMetadata();
113-
String dataciteXml = DOIDataCiteRegisterService.getMetadataFromDvObject(
114-
dv.getDataset().getGlobalId().asString(), metadata, dv.getDataset());
109+
String dataciteXml = getDataCiteXml(dv);
115110

116111
MessageDigest messageDigest = MessageDigest.getInstance("MD5");
117112
try (PipedInputStream dataciteIn = new PipedInputStream();

src/main/java/edu/harvard/iq/dataverse/engine/command/impl/GoogleCloudSubmitToArchiveCommand.java

+1-6
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
package edu.harvard.iq.dataverse.engine.command.impl;
22

3-
import edu.harvard.iq.dataverse.DOIDataCiteRegisterService;
4-
import edu.harvard.iq.dataverse.DataCitation;
53
import edu.harvard.iq.dataverse.Dataset;
64
import edu.harvard.iq.dataverse.DatasetVersion;
75
import edu.harvard.iq.dataverse.DatasetLock.Reason;
@@ -73,10 +71,7 @@ public WorkflowStepResult performArchiveSubmission(DatasetVersion dv, ApiToken t
7371
String spaceName = dataset.getGlobalId().asString().replace(':', '-').replace('/', '-')
7472
.replace('.', '-').toLowerCase();
7573

76-
DataCitation dc = new DataCitation(dv);
77-
Map<String, String> metadata = dc.getDataCiteMetadata();
78-
String dataciteXml = DOIDataCiteRegisterService.getMetadataFromDvObject(
79-
dv.getDataset().getGlobalId().asString(), metadata, dv.getDataset());
74+
String dataciteXml = getDataCiteXml(dv);
8075
MessageDigest messageDigest = MessageDigest.getInstance("MD5");
8176
try (PipedInputStream dataciteIn = new PipedInputStream();
8277
DigestInputStream digestInputStream = new DigestInputStream(dataciteIn, messageDigest)) {

src/main/java/edu/harvard/iq/dataverse/engine/command/impl/LocalSubmitToArchiveCommand.java

+3-7
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
package edu.harvard.iq.dataverse.engine.command.impl;
22

3-
import edu.harvard.iq.dataverse.DOIDataCiteRegisterService;
4-
import edu.harvard.iq.dataverse.DataCitation;
53
import edu.harvard.iq.dataverse.Dataset;
64
import edu.harvard.iq.dataverse.DatasetVersion;
75
import edu.harvard.iq.dataverse.DatasetLock.Reason;
@@ -58,18 +56,16 @@ public WorkflowStepResult performArchiveSubmission(DatasetVersion dv, ApiToken t
5856
String spaceName = dataset.getGlobalId().asString().replace(':', '-').replace('/', '-')
5957
.replace('.', '-').toLowerCase();
6058

61-
DataCitation dc = new DataCitation(dv);
62-
Map<String, String> metadata = dc.getDataCiteMetadata();
63-
String dataciteXml = DOIDataCiteRegisterService
64-
.getMetadataFromDvObject(dv.getDataset().getGlobalId().asString(), metadata, dv.getDataset());
65-
59+
String dataciteXml = getDataCiteXml(dv);
60+
6661
FileUtils.writeStringToFile(
6762
new File(localPath + "/" + spaceName + "-datacite.v" + dv.getFriendlyVersionNumber() + ".xml"),
6863
dataciteXml, StandardCharsets.UTF_8);
6964
BagGenerator bagger = new BagGenerator(new OREMap(dv, false), dataciteXml);
7065
bagger.setNumConnections(getNumberOfBagGeneratorThreads());
7166
bagger.setAuthenticationKey(token.getTokenString());
7267
zipName = localPath + "/" + spaceName + "v" + dv.getFriendlyVersionNumber() + ".zip";
68+
//ToDo: generateBag(File f, true) seems to do the same thing (with a .tmp extension) - since we don't have to use a stream here, could probably just reuse the existing code?
7369
bagger.generateBag(new FileOutputStream(zipName + ".partial"));
7470

7571
File srcFile = new File(zipName + ".partial");

0 commit comments

Comments
 (0)