Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request/Idea: Nested compound fields #9200

Open
vera opened this issue Nov 30, 2022 · 3 comments
Open

Feature Request/Idea: Nested compound fields #9200

vera opened this issue Nov 30, 2022 · 3 comments
Labels
Feature: Metadata Type: Feature a feature request User Role: Depositor Creates datasets, uploads data, etc.

Comments

@vera
Copy link
Contributor

vera commented Nov 30, 2022

Overview of the Feature Request

Currently, Dataverse doesn't fully support nested compound fields (i.e. compound fields within compound fields). More precisely, the UI cannot display them, although the API and the internal data model already appear to support them. In some constellations, the SOLR schema is also not updated correctly.

What kind of user is the feature intended for?

Any user using the UI to view/edit datasets (curator, depositor, guest, superuser...) and API users

What inspired the request?

We would like to use custom metadata schemas that contain nested compound fields.

DataCite v>=4.1 also contains fields like this, e.g. see "7. Contributor" (0-n) with its children "7.4 nameIdentifier" (0-n) and "7.5 affiliation" (0-n) in https://schema.datacite.org/meta/kernel-4.4/doc/DataCite-MetadataKernel_v4.4.pdf

I have tested the following three basic scenarios of nested compound fields (load custom TSV, upload dataset via API, export dataset via API):

  1. compound field nested within compound field
  2. like 1. but outer field has allowmultiple TRUE
  3. like 1. but both fields have allowmultiple TRUE

The TSV files and JSON datasets are here: nested_compound_test.tar.gz

Although the UI cannot display the nested fields, in scenarios 1 and 3 the export actually matched the initial upload, thus it seems the API and the internal data model already support this feature. In scenario 2 I got an error message at export (Exception thrown from bean: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://solr:8983/solr/collection1: ERROR: [doc=dataset_9_draft] multiple values encountered for non multiValued field). After manually editing the SOLR schema file to set multiValued="true", the export succeeded here as well.

What existing behavior do you want changed?

The SOLR schema should be updated correctly when a TSV with nested compound fields is uploaded.

Any brand new behavior do you want to add to Dataverse?

The UI should be able to display nested compound fields when creating, editing and viewing a dataset.

Currently, it looks like this (empty because inner children aren't shown):

Screenshot 2022-11-30 at 17-15-55 Add New Dataset - Root

Any related open or closed issues to this feature request?

#377 is related. When allowing multiples of a compound child, in some cases the compound child itself should also be a compound, e.g.:

"Author" is a compound field. To allow an author to have multiple identifiers, the children "Identifier Scheme" and "Identifier" should be grouped in a repeatable nested compound field.

(See also "7. Contributor" (0-n), "7.4 nameIdentifier" (0-n), "7.4.a nameIdentifierScheme", "7.4.b schemeURI" in https://schema.datacite.org/meta/kernel-4.4/doc/DataCite-MetadataKernel_v4.4.pdf (present since DataCite v4.1 as property 7.5))

cc @johannes-darms

@poikilotherm
Copy link
Contributor

For the Solr schema discussion, you might be interested in #5989 and #7662.

I did some work for that before, see here for 5989 and here for the newer approach on 7662 with an (unfinished) TSV block parser to validate blocks and create the complete Solr core template as a package.

@pdurbin
Copy link
Member

pdurbin commented Jan 24, 2023

@vera thanks for opening this issue! For now I'm just adding ".txt" to the files in your tarball so I can more easily look at them from this issue. I might update this comment in a bit with something more substantial to say. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Metadata Type: Feature a feature request User Role: Depositor Creates datasets, uploads data, etc.
Projects
None yet
Development

No branches or pull requests

3 participants