-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Users can download Schema.org metadata from Export Metadata pulldown #3700
Comments
I just implemented this in pull request #4302 Here's how it looks in the GUI: I set Moving to code review at https://waffle.io/IQSS/dataverse |
Could schema.org be capitalized? "Schema.org JSON-LD" I know the schema.org website has it all lowercase in their site header. It's not so clear throughout the rest of the website. DataCite capitalizes it and it looks better in the list. |
@jggautier sure, capitalized in b00d4d6 |
@jggautier I'm looking at https://schema.org/Dataset for guidance on how to support multiple descriptions (something @kcondon asked about). It says keywords are delimited by commas. Am I suppose to do the same with description? What if the description itself has commas? Or am I supposed to put each descriptions inside an array? That doesn't validate: Please advise. Thanks. Also, I'm not sure what's going on with that warning about affiliation above. |
I separated each description with a comma in the example json doc, like with the keywords. I thought that because each value is enclosed in quotation marks, there wouldn't be problems with commas in the values and using commas as delimiters. (The tool's asking for an @type, and the @type for description is "Text", but then we get a validation error because the @type for "name" is "Thing". Since we're not including anything else with the description, like the description date, it seems easier to exclude name.) We decided to ignore that author affiliation warning. When there's no @type, the default @type "Thing" is used. Google's validation tool doesn't like "Thing" for author affiliation, but we can't use the @types it does like: "Person" and "Organization". We agreed it's okay to ignore the warning. The less-preferred alternatives are (1) using the @type "Author" or "Organization", which says that every author is a person or organization, which isn't true, and Dataverse has no way of knowing which author is a person and which is an organization or (2) not including an affiliation. |
@jggautier ok so you want something like this (as shown in your screenshot above):
Makes sense, thanks. I'll see what I can do. In the code there are a lot of places where we grab just the first description of a dataset so I'm not sure why it's so important to expose multiple descriptions here but I assume you and @kcondon have your reasons. I'm still confused about how to interpret https://schema.org/Dataset in the sense that I'm surprised that a field like "description" can be either a simple string...
... or an array...
All "description" says is "Text" and "A description of the item." And if you look at https://schema.org/Text it doesn't say something like "this can be a string or an array" from what I can tell. Are there validators for schema.org apart from Google's online tool? Is the format expressed in JSON Schema ( http://json-schema.org ) or similar? Ideally I'd add some unit tests to the code to make sure Dataverse is emitting schema.org JSON that is valid according to the specification. Thanks also for letting me know that "affiliation" is expected to throw a warning. This feels like a bug to me. I can take out affiliation if we would prefer there to be no warnings. At the very least I'll add javadoc indicating that these warnings are expected. |
@landreev @pdurbin @jggautier Now that we have this in export, should we use this cached version for the JSON-LD on the page? (as a performance enhancement) |
@scolapasta let's have a face to face conversation with @djbrooke and others to define the definition of done for this issue. It was estimated as a "2" and it's already in QA. I think @kcondon was close to merging it until he had a question about multiple descriptions. Speaking of multiple descriptions, @jggautier and I talked this out this morning and we are not going to support multiple descriptions after all. All of the examples at https://schema.org/description show a single description, like this: When folks like @sfarnel and @johnhuck come along and investigate how we've implemented schema.org support ( see ualbertalib/metadata#154 ), I assume they'll expect a description as in the screenshot above. That is to say, we've implemented "description" as a string rather than an array of strings. I still find it very odd that Google's testing tool is happy with both. Again, I'd love to see some JSON Schema or similar for this specification. I'd love to be able to write unit tests to validate the JSON that Dataverse is emitting. @landreev would be in a better position than I am to know how much work it would be to make the performance enhancement you're talking about. The only change I was thinking of making to the code is adding some javadoc so that developers aren't surprised (as I was) that the JSON-LD we emit shows a warning in https://search.google.com/structured-data/testing-tool for "affiliation". @jggautier mentioned that someone at Google has approved ignoring the warning. |
Sure; could be we consider in scope, or we could open up as a separate issue (or decide it's not necessary). Just wanted to make sure we considered it. I think the change would just be in the method that gets the JSON-LD to look for a file instead of generating, but yes @landreev will be able to better estimate. |
Ok, all morning I've been discussing this issue with @scolapasta @jggautier @kcondon @landreev @pameyer and probably some others. I just made some changes to the code, which I'll enumerate below, and pass this ticket to QA. If anyone wants to do some code review, please be my guest.
The format we are emitting is the same as when @kcondon last tested it yesterday. @jggautier plans to open an issue about how we display descriptions (especially if there are multiple) in the app. |
implement export of schema.org JSON-LD #3700
commit e19a346 Author: Ruben Andreassen <rubean85@gmail.com> Date: Mon Dec 4 12:20:54 2017 +0100 Forgot username commit 0d478a7 Merge: 45288aa 8aa4150 Author: Ruben Andreassen <rubean85@gmail.com> Date: Mon Dec 4 10:56:10 2017 +0100 Merge dataporten into 4334-oauth-dataporten commit 45288aa Merge: caf6371 4648b6a Author: Ruben <rubean85@gmail.com> Date: Fri Dec 1 14:45:44 2017 +0100 Merge pull request #1 from IQSS/develop test commit 4648b6a Merge: 0f36aa0 fff836c Author: kcondon <kcondon@hmdc.harvard.edu> Date: Thu Nov 30 18:44:35 2017 -0500 Merge pull request IQSS#4331 from IQSS/4330-no-affiliation add null check for datasetAuthor.getAffiliation() IQSS#4330 commit fff836c Author: Philip Durbin <philip_durbin@harvard.edu> Date: Thu Nov 30 16:39:26 2017 -0500 add null check for datasetAuthor.getAffiliation() IQSS#4330 commit 0f36aa0 Merge: e2878ce fad8669 Author: kcondon <kcondon@hmdc.harvard.edu> Date: Thu Nov 30 15:07:54 2017 -0500 Merge pull request IQSS#4325 from IQSS/4324-header-padding Fixed padding layout issue with dataverse name text link in header IQSS#4324 commit fad8669 Author: Michael Heppler <mheppler@hmdc.harvard.edu> Date: Thu Nov 30 10:14:53 2017 -0500 Fixed padding layout issue with dataverse name text link in header. [ref IQSS#4324] commit e2878ce Merge: d785c5c cb9647f Author: kcondon <kcondon@hmdc.harvard.edu> Date: Wed Nov 29 18:22:53 2017 -0500 Merge pull request IQSS#4305 from IQSS/4304-navbar-search use "?" (`&IQSS#63;`) rather than "&" (`&IQSS#38;`) before "q" IQSS#4304 commit d785c5c Merge: a881f36 3cc02d0 Author: kcondon <kcondon@hmdc.harvard.edu> Date: Wed Nov 29 18:19:25 2017 -0500 Merge pull request IQSS#4302 from IQSS/3700-export-schema.org implement export of schema.org JSON-LD IQSS#3700 commit 3cc02d0 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 29 12:53:04 2017 -0500 have dataset page get cached JSON-LD, if available IQSS#3700 commit 84224bd Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 29 12:45:53 2017 -0500 guard against null terms.getTermsOfUse() IQSS#3700 commit ba9c6bd Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 29 12:28:16 2017 -0500 API: document "schema.org" as a supported export format IQSS#3700 commit e5c2528 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 29 12:11:17 2017 -0500 capitalize Schema.org in guides IQSS#3700 commit 086824d Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 29 10:57:32 2017 -0500 note that we know "affliation" throws a warning IQSS#3700 commit a881f36 Merge: b20ab14 23b865c Author: kcondon <kcondon@hmdc.harvard.edu> Date: Tue Nov 28 16:28:04 2017 -0500 Merge pull request IQSS#4312 from IQSS/4197-bundle-error Fixed bundle reference to "parent" dataverse for Theme + Widget pg IQSS#4197 commit 34859e7 Merge: 2f278cc b20ab14 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Nov 28 16:24:56 2017 -0500 Merge branch 'develop' into 3700-export-schema.org IQSS#3700 commit 23b865c Author: Michael Heppler <mheppler@hmdc.harvard.edu> Date: Tue Nov 28 14:42:12 2017 -0500 Fixed bundle reference to "parent" dataverse for Theme + Widget pg. [ref IQSS#4197] commit b20ab14 Merge: caf6371 8e6354a Author: kcondon <kcondon@hmdc.harvard.edu> Date: Tue Nov 28 14:01:39 2017 -0500 Merge pull request IQSS#4277 from IQSS/4197-dv-header 4197 dv header commit 8e6354a Author: Michael Heppler <mheppler@hmdc.harvard.edu> Date: Tue Nov 28 13:23:15 2017 -0500 Changed references from "customization" to "theme" in Theme + Widgets pg. [ref IQSS#4197] commit c312a85 Author: Derek Murphy <dlmurphy@g.harvard.edu> Date: Tue Nov 28 13:05:39 2017 -0500 Doc rewrites [IQSS#4197] Rewrote some text on the config page for clarity, changed terminology usage in dataverse management page to make it more consistent commit f68b81d Author: Michael Heppler <mheppler@hmdc.harvard.edu> Date: Tue Nov 28 12:15:40 2017 -0500 Removed commented out theme logic found in QA. [ref IQSS#4197] commit 624922f Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Nov 28 11:09:26 2017 -0500 when adding row to dataversetheme, use white instead of gray IQSS#4197 commit cb9647f Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 27 10:27:30 2017 -0500 use "?" (&IQSS#63;) rather than "&" (&IQSS#38;) before "q" IQSS#4304 commit d8028f1 Merge: 36d9228 caf6371 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 27 09:33:03 2017 -0500 Merge branch 'develop' into 4197-dv-header IQSS#4197 commit 2f278cc Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 22 12:33:56 2017 -0500 cleanup IQSS#3700 commit b00d4d6 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 22 12:28:25 2017 -0500 capitalize "Schema.org" IQSS#3700 commit 8f52663 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 22 11:06:41 2017 -0500 implement export of schema.org JSON-LD IQSS#3700 commit caf6371 Merge: c67a39f d80b9d1 Author: kcondon <kcondon@hmdc.harvard.edu> Date: Tue Nov 21 16:29:07 2017 -0500 Merge pull request IQSS#4297 from IQSS/orcid_v21 orcid v2.1 changes (mainly https for profile page link) commit c67a39f Merge: 0918fae a756751 Author: kcondon <kcondon@hmdc.harvard.edu> Date: Mon Nov 20 15:48:37 2017 -0500 Merge pull request IQSS#4252 from IQSS/2243-schema.org-json-ld 2243 schema.org json ld commit d80b9d1 Author: Pete Meyer <pameyer@crystal.harvard.edu> Date: Mon Nov 20 14:32:09 2017 -0500 orcid v2.1 changes (mainly https for profile page link) commit 0918fae Merge: 3013c0d dcfcbaf Author: kcondon <kcondon@hmdc.harvard.edu> Date: Mon Nov 20 14:31:41 2017 -0500 Merge pull request IQSS#4276 from IQSS/4250-ingest-failed make it clear that file upload is complete IQSS#4250 commit 3013c0d Merge: b4cea62 3f0f7e8 Author: kcondon <kcondon@hmdc.harvard.edu> Date: Mon Nov 20 14:21:37 2017 -0500 Merge pull request IQSS#4275 from IQSS/4262-describe-method move `describe` from EjbDataverseEngine to Command interface IQSS#4262 commit 36d9228 Merge: d612189 b4cea62 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 17 16:38:34 2017 -0500 Merge branch 'develop' into 4197-dv-header IQSS#4197 commit dcfcbaf Merge: 268c3dc b4cea62 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 17 16:36:21 2017 -0500 Merge branch 'develop' into 4250-ingest-failed IQSS#4250 commit 3f0f7e8 Merge: 633a19d b4cea62 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 17 16:33:37 2017 -0500 Merge branch 'develop' into 4262-describe-method IQSS#4262 commit a756751 Merge: eec1163 b4cea62 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 17 16:32:43 2017 -0500 Merge branch 'develop' into 2243-schema.org-json-ld IQSS#2243 Conflicts (just imports: src/main/java/edu/harvard/iq/dataverse/DatasetPage.java commit eec1163 Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Fri Nov 17 15:58:38 2017 -0500 Per conversation with jgautier stipped the '@type="person"' attribute in the author fragment; since it can be a person or an organization; this results in a warning from google validation tool (because "Thing" is not supposed to have an affiliation) but it appears to be ok to live with it. commit 0801d56 Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Fri Nov 17 15:36:04 2017 -0500 ldjson should will only be embedded into the page if this is the LATEST PUBLISHED version (IQSS#2243) commit a2742c5 Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Fri Nov 17 15:08:40 2017 -0500 latest changest to ld json formatting, making the fragment pass the google validation tool test. (IQSS#2243) commit d612189 Author: Derek Murphy <dlmurphy@g.harvard.edu> Date: Fri Nov 17 13:01:55 2017 -0500 Docs: extremely nitpicky word change [IQSS#4197] Changed a couple words in the config page. commit d277669 Author: Michael Heppler <mheppler@hmdc.harvard.edu> Date: Thu Nov 16 16:21:29 2017 -0500 Added tip to Installation Guide > Configuration > Custom Header related to disable root theme. [ref IQSS#4197] commit 80219c5 Author: Derek Murphy <dlmurphy@g.harvard.edu> Date: Thu Nov 16 11:43:59 2017 -0500 Syntax + typo fix Small edit, fixed a typo and a syntax error in (ironically) a header in the docs commit e0399c1 Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Wed Nov 15 19:50:54 2017 -0500 ...and a quick fix for the "temporalCoverage" entry (IQSS#2243) commit 67882ff Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Wed Nov 15 19:41:05 2017 -0500 the ld json fragment should now be structured as specified in the issue IQSS#2243. commit 8b8391f Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Wed Nov 15 13:24:22 2017 -0500 added topicClassifications and kewords to JSONLD. (IQSS#2243) commit 28f705c Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 15 12:58:11 2017 -0500 implement :DisableRootDataverseTheme db setting IQSS#4197 commit 268c3dc Author: Michael Heppler <mheppler@hmdc.harvard.edu> Date: Wed Nov 15 12:54:50 2017 -0500 Revised ingest error popover message text. Fixed icon spacing issue. [ref IQSS#4250] commit 7cd2fea Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 15 12:01:57 2017 -0500 Revert "stub out UI for disabling root dataverse theme IQSS#4197 " This reverts commit b9c3c56. We're going to use a database setting instead. commit b9c3c56 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 15 08:53:36 2017 -0500 stub out UI for disabling root dataverse theme IQSS#4197 commit 1f938e9 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 15 08:18:25 2017 -0500 Revert "only show header for non-root dataverses IQSS#4197 " This reverts commit 8eccacd. commit 633a19d Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Nov 14 19:02:10 2017 -0500 affectedDvObjects is a better name for this field IQSS#4262 commit 9a3f4a3 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Nov 14 17:10:06 2017 -0500 add the role to the message IQSS#4262 commit 7cfc8ba Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Nov 14 10:09:18 2017 -0500 override `describe` in AssignRoleCommand IQSS#4262 commit 023cb8f Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 13 16:09:43 2017 -0500 remove parameters since the Command has them IQSS#4262 commit 8eccacd Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 13 15:52:37 2017 -0500 only show header for non-root dataverses IQSS#4197 commit 7795e70 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 13 15:22:08 2017 -0500 change header background from gray to white IQSS#4197 commit e434dd0 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 13 14:28:23 2017 -0500 make it clear that file upload is complete IQSS#4250 commit 26eb11d Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 13 14:18:57 2017 -0500 move `describe` from EjbDataverseEngine to Command interface IQSS#4262 commit 7d03e70 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Nov 7 16:21:37 2017 -0500 consistency between DC.subject and JSON-LD keywords IQSS#2243 commit 9f1d057 Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Mon Nov 6 21:58:32 2017 -0500 one more addition for IQSS#2243 - added temporalCoverage. commit 8c74e37 Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Mon Nov 6 21:28:06 2017 -0500 A few quick fixes for getJsonLd() (and the corresponding test in DatasetVersionTest()); (ref IQSS#2243) commit c941781 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 3 12:21:12 2017 -0400 explain why ui:insert lines are in the template IQSS#2243 commit 1aa323a Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 3 12:20:52 2017 -0400 remove unused imports used in this branch IQSS#2243 commit f8ca59f Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 3 12:13:05 2017 -0400 add tests for getJsonLd and getPublicationDateAsString IQSS#2243 commit b1db8ee Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 3 11:26:37 2017 -0400 rename to publicationDateAsString and improve javadoc IQSS#2243 commit 8f3083c Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 3 11:14:13 2017 -0400 delete cruft (unused method) IQSS#2243 commit 6c5f044 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Thu Nov 2 15:41:12 2017 -0400 use dateModified and proper schemaVersion URL IQSS#2243 commit 171c8f3 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Thu Nov 2 15:29:35 2017 -0400 move getJsonLd method to DatasetVersion entity IQSS#2243 commit 485a5ca Author: Philip Durbin <philip_durbin@harvard.edu> Date: Thu Nov 2 15:25:37 2017 -0400 don't even try to figure out if the author is a person or not IQSS#2243 commit 80b5a88 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Thu Nov 2 15:19:49 2017 -0400 limit to non-published, not just non-drafts IQSS#2243 Also add helper method. commit ad71c6a Author: Philip Durbin <philip_durbin@harvard.edu> Date: Thu Nov 2 15:17:32 2017 -0400 use same date format as meta name="DC.date" IQSS#2243 commit 2cc958d Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 1 13:30:15 2017 -0400 fix a number of issues (listed below) IQSS#3793 IQSS#2243 - only show published versions - show URL to DOI dynamically (was hard coded) - show publication date - show correct publisher - show correct provider commit 5ad88fc Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 1 13:15:00 2017 -0400 better author name parsing (could be an org!) IQSS#3793 IQSS#2243 commit 1b62596 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Oct 31 14:57:01 2017 -0400 stub out dataset in json-ld format IQSS#3793
Users should be able to download Schema.org metadata from the Export Metadata pulldown in each dataset Metadata tab:
This would be the same metadata being added to the header of each dataset landing page in schema.org markup (#2243)
The additional metadata export should be added to the list of exports in the Admin Guides' Automatic Export section.
The text was updated successfully, but these errors were encountered: