Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved Schema.org JSON-LD output #5169

Merged
merged 38 commits into from
Nov 7, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
a6915d7
change "author" to "creator" #4371
pdurbin Oct 11, 2018
fe01e80
assert how "keywords" works, increase code coverage #4371
pdurbin Oct 11, 2018
101bd81
stop hard-coding "Dataverse" as the provider #4371
pdurbin Oct 11, 2018
4ca8f38
add @id and url as persistent URL #4371
pdurbin Oct 11, 2018
043080d
add funder #4371
pdurbin Oct 11, 2018
0d260fa
add author identifier #4371
pdurbin Oct 11, 2018
a84f1d1
add spatialCoverage #4371
pdurbin Oct 11, 2018
ab2de1a
make descriptions multi-valued (backward incompatible) #4371
pdurbin Oct 12, 2018
4c1387a
add file metadata to output #4371
pdurbin Oct 12, 2018
217b928
express author identifier as a URL #4371
pdurbin Oct 22, 2018
0cdbe3e
add second funder type: grantNumberAgency #4371
pdurbin Oct 22, 2018
22e08c8
assert existing citation behavior #4371
pdurbin Oct 22, 2018
3a6c156
citations now objects not strings (backward incompatible) #4371
pdurbin Oct 22, 2018
354462e
put spatial coverage on single line, sorted with commas #4371
pdurbin Oct 22, 2018
24c3249
duplicate values in "creator" and "author" and note why #4371
pdurbin Oct 23, 2018
8c242f8
add test for datePublished #4371
pdurbin Oct 23, 2018
f312949
add tests for temporalCoverage #4371
pdurbin Oct 23, 2018
c0361f0
add tests for "license" #4371
pdurbin Oct 23, 2018
c162e45
add "publisher" as duplicate of "provider" #4371
pdurbin Oct 23, 2018
e0fc235
add more javadoc #4371
pdurbin Oct 23, 2018
8760a5a
note backward-incompatible changes in API Guide #4371
pdurbin Oct 26, 2018
e35651a
list Schema.org in User Guide, links from Admin Guide #4371
pdurbin Oct 26, 2018
ecde23a
add developer-oriented docs #4371
pdurbin Oct 26, 2018
aecc12d
add test for non-CC0 #4371
pdurbin Oct 26, 2018
81cac6d
fix typo and add two validation tools to list #4371
pdurbin Oct 26, 2018
c27f10b
write example JSON to disk, more real DOIs #4371
pdurbin Oct 26, 2018
f8a5539
Merge branch 'develop' into 4371-schemaorg #4371
pdurbin Oct 30, 2018
5c2ed68
stop using English ("Funder") in logic, release notes experiment #4371
pdurbin Oct 31, 2018
2a8aeee
Revert "stop using English ("Funder") in logic, release notes experim…
pdurbin Nov 2, 2018
33cd45f
add note about JVM option being experimental #4371
pdurbin Nov 2, 2018
3a95edf
remove "schemaVersion" from output #4371
pdurbin Nov 5, 2018
50bc8ca
remove "url" from output #4371
pdurbin Nov 5, 2018
882cbfb
clarify publisher vs. provider comment #4371
pdurbin Nov 5, 2018
7cd5622
single "isValidAuthorIdentifier" method, pass in regex #4371
pdurbin Nov 5, 2018
55fbb93
Merge branch 'develop' into 4371-schemaorg #4371
pdurbin Nov 5, 2018
4ed14e2
switch to NullSafeJsonBuilder #4371
pdurbin Nov 5, 2018
fcae94e
Prevent href, target=_blank from getting into Schema.org JSON-LD outp…
pdurbin Nov 6, 2018
0e0b55d
add `@id` along side `identifier` at file level #4371
pdurbin Nov 6, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions doc/sphinx-guides/source/admin/metadataexport.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,13 @@ Export Failures
---------------

An export batch job, whether started via the API, or by the application timer, will leave a detailed log in your configured logs directory. This is the same location where your main Glassfish server.log is found. The name of the log file is ``export_[timestamp].log`` - for example, *export_2016-08-23T03-35-23.log*. The log will contain the numbers of datasets processed successfully and those for which metadata export failed, with some information on the failures detected. Please attach this log file if you need to contact Dataverse support about metadata export problems.

Downloading Metadata via GUI
----------------------------

The :doc:`/user/dataset-management` section of the User Guide explains how end users can download the metadata formats above from the Dataverse GUI.

Downloading Metadata via API
----------------------------

The :doc:`/api/native-api` section of the API Guide explains how end users can download the metadata formats above via API.
10 changes: 10 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,16 @@ Export Metadata of a Dataset in Various Formats

.. note:: Supported exporters (export formats) are ``ddi``, ``oai_ddi``, ``dcterms``, ``oai_dc``, ``schema.org`` , and ``dataverse_json``.

Schema.org JSON-LD
^^^^^^^^^^^^^^^^^^

Please note that the ``schema.org`` format has changed in backwards-incompatible ways after Dataverse 4.9.4:

- "description" was a single string and now it is an array of strings.
- "citation" was an array of strings and now it is an array of objects.

Both forms are valid according to Google's Structured Data Testing Tool at https://search.google.com/structured-data/testing-tool . (This tool will report "The property affiliation is not recognized by Google for an object of type Thing" and this known issue is being tracked at https://github.com/IQSS/dataverse/issues/5029 .) Schema.org JSON-LD is an evolving standard that permits a great deal of flexibility. For example, https://schema.org/docs/gs.html#schemaorg_expected indicates that even when objects are expected, it's ok to just use text. As with all metadata export formats, we will try to keep the Schema.org JSON-LD format Dataverse emits backward-compatible to made integrations more stable, despite the flexibility that's afforded by the standard.

List Files in a Dataset
~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
13 changes: 13 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -755,6 +755,19 @@ This JVM option is used to configure the path where all the language specific pr

If this value is not set, by default, a Dataverse installation will read the English language property files from the Java Application.

dataverse.files.hide-schema-dot-org-download-urls
+++++++++++++++++++++++++++++++++++++++++++++++++

Please note that this setting is experimental.

By default, download URLs to files will be included in Schema.org JSON-LD output. To prevent these URLs from being included in the output, set ``dataverse.files.hide-schema-dot-org-download-urls`` to true as in the example below.

``./asadmin create-jvm-options '-Ddataverse.files.hide-schema-dot-org-download-urls=true'``

Please note that there are other reasons why download URLs may not be included for certain files such as if a guestbook entry is required or if the file is restricted.

For more on Schema.org JSON-LD, see the :doc:`/admin/metadataexport` section of the Admin Guide.

Database Settings
-----------------

Expand Down
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/user/dataset-management.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ A dataset contains three levels of metadata:

For more details about what Citation and Domain Specific Metadata is supported please see our :ref:`user-appendix`.

Note that once a dataset has been published its metadata may be exported. A button on the dataset page's metadata tab will allow a user to export the metadata of the most recently published version of the dataset. Currently supported export formats are DDI, Dublin Core and JSON.
Note that once a dataset has been published its metadata may be exported. A button on the dataset page's metadata tab will allow a user to export the metadata of the most recently published version of the dataset. Currently supported export formats are DDI, Dublin Core, Schema.org JSON-LD, and Dataverse's native JSON format.

Adding a New Dataset
====================
Expand Down
58 changes: 57 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/DatasetAuthor.java
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
package edu.harvard.iq.dataverse;

import java.util.Comparator;
import java.util.regex.Pattern;


/**
Expand Down Expand Up @@ -87,5 +88,60 @@ public boolean isEmpty() {
&& (name==null || name.getValue().trim().equals(""))
);
}


/**
* https://support.orcid.org/hc/en-us/articles/360006897674-Structure-of-the-ORCID-Identifier
*/
final public static String REGEX_ORCID = "^\\d{4}-\\d{4}-\\d{4}-(\\d{4}|\\d{3}X)$";
final public static String REGEX_ISNI = "^\\d*$";
final public static String REGEX_LCNA = "^[a-z]+\\d+$";
final public static String REGEX_VIAF = "^\\d*$";
/**
* GND regex from https://www.wikidata.org/wiki/Property:P227
*/
final public static String REGEX_GND = "^1[01]?\\d{7}[0-9X]|[47]\\d{6}-\\d|[1-9]\\d{0,7}-[0-9X]|3\\d{7}[0-9X]$";

/**
* Each author identification type has its own valid pattern/syntax.
*/
public static Pattern getValidPattern(String regex) {
return Pattern.compile(regex);
}

public String getIdentifierAsUrl() {
if (idType != null && !idType.isEmpty() && idValue != null && !idValue.isEmpty()) {
DatasetFieldValueValidator datasetFieldValueValidator = new DatasetFieldValueValidator();
switch (idType) {
case "ORCID":
if (datasetFieldValueValidator.isValidAuthorIdentifier(idValue, getValidPattern(REGEX_ORCID))) {
return "https://orcid.org/" + idValue;
}
break;
case "ISNI":
if (datasetFieldValueValidator.isValidAuthorIdentifier(idValue, getValidPattern(REGEX_ISNI))) {
return "http://www.isni.org/isni/" + idValue;
}
break;
case "LCNA":
if (datasetFieldValueValidator.isValidAuthorIdentifier(idValue, getValidPattern(REGEX_LCNA))) {
return "http://id.loc.gov/authorities/names/" + idValue;
}
break;
case "VIAF":
if (datasetFieldValueValidator.isValidAuthorIdentifier(idValue, getValidPattern(REGEX_VIAF))) {
return "https://viaf.org/viaf/" + idValue;
}
break;
case "GND":
if (datasetFieldValueValidator.isValidAuthorIdentifier(idValue, getValidPattern(REGEX_GND))) {
return "https://d-nb.info/gnd/" + idValue;
}
break;
default:
break;
}
}
return null;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
import java.util.Date;
import java.util.GregorianCalendar;
import java.util.logging.Logger;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.validation.ConstraintValidator;
import javax.validation.ConstraintValidatorContext;
import org.apache.commons.lang.StringUtils;
Expand Down Expand Up @@ -216,4 +218,8 @@ private boolean isValidDate(String dateString, String pattern) {
return valid;
}

public boolean isValidAuthorIdentifier(String userInput, Pattern pattern) {
return pattern.matcher(userInput).matches();
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@

public class DatasetRelPublication {


/**
* The "text" is the citation of the related publication.
*/
private String text;
private String idType;
private String idNumber;
Expand Down
Loading