Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correctly set "multiValued" property in Solr schema for deeply nested fields #11136

Merged

Conversation

vera
Copy link
Contributor

@vera vera commented Jan 8, 2025

What this PR does / why we need it:

This PR fixes a bug in the /api/admin/index/solr/schema endpoint reported in #9200. The multiValued property was not set correctly for some deeply nested metadata fields, causing indexing errors:

org.apache.solr.common.SolrException: ERROR: [doc=dataset_1479] multiple values encountered for non multiValued field

This is because the code only checked for whether the direct parent of a metadata field is multi-valued to see whether a child metadata field must be declared as multi-valued in the Solr schema. However, all ancestor fields (parents, grandparents, etc.) need to be checked. This PR updates the code to handle deeper metadata nesting properly.

Which issue(s) this PR closes:

This is related to #9200, but doesn't close the issue, because it also describes UI issues related to deeply nested fields, but it's a fix of the Solr schema bug described there.

Special notes for your reviewer:

/

Suggestions on how to test this:

  1. Upload TSV metadata block file e.g. nested_compound_test2.csv (had to rename to CSV to upload here)
  2. Activate metadata block
  3. Use /api/admin/index/solr/schema to update Solr's schema.xml
  4. Reload Solr

(see https://guides.dataverse.org/en/latest/admin/metadatacustomization.html#updating-the-solr-schema)

  1. Upload dataset as JSON e.g. nested_compound_test2.json
  2. Confirm the dataset indexes without errors

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

/

Is there a release notes update needed for this change?:

/

Additional documentation:

/

@coveralls
Copy link

coveralls commented Jan 8, 2025

Coverage Status

coverage: 22.754% (+0.06%) from 22.693%
when pulling a236d5c on vera:fix/9200-allowmultiples-for-deep-nesting
into 3352bb7 on IQSS:develop.

@ofahimIQSS ofahimIQSS added the Size: 3 A percentage of a sprint. 2.1 hours. label Jan 28, 2025
@cmbz cmbz added FY25 Sprint 15 FY25 Sprint 15 (2025-01-15 - 2025-01-29) FY25 Sprint 16 FY25 Sprint 16 (2025-01-29 - 2025-02-12) labels Jan 29, 2025
@pdurbin pdurbin self-assigned this Jan 31, 2025
@pdurbin
Copy link
Member

pdurbin commented Jan 31, 2025

Before switching to this branch, I'm just adding a some detail of the errors one sees on the develop branch when trying to create a dataset using the JSON and TSV above:

dev_dataverse> request: http://solr:8983/solr/collection1
dev_dataverse> Remote error message: ERROR: [doc=dataset_3_draft] multiple values encountered for non multiValued field identifier_type2: [ORCID, ORCID]
...
dev_solr> 2025-01-31 16:43:46.984 ERROR (qtp1331270134-27) [c: s: r: x:collection1 t:null-27] o.a.s.h.RequestHandlerBase Client exception => org.apache.solr.common.SolrException: ERROR: [doc=dataset_3_draft] multiple values encountered for non multiValued field identifier_type2: [ORCID, ORCID]

Also, I don't think this will surprise anyone, but the JSF UI does not support deeply nested fields. It's completely out of scope to fix this in JSF. It looks like this when you click "create dataset":

Screenshot 2025-01-31 at 11 34 42 AM

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great. I added a release note. Approved.

Here are the fields that were added when I updated the Solr schema:

328a329,331
>     <field name="identifier2" type="text_en" multiValued="true" stored="true" indexed="true"/>
>     <field name="identifier_text2" type="text_en" multiValued="true" stored="true" indexed="true"/>
>     <field name="identifier_type2" type="text_en" multiValued="true" stored="true" indexed="true"/>
349a353
>     <field name="person2" type="text_en" multiValued="true" stored="true" indexed="true"/>
570a575,577
>     <copyField source="identifier2" dest="_text_" maxChars="3000"/>
>     <copyField source="identifier_text2" dest="_text_" maxChars="3000"/>
>     <copyField source="identifier_type2" dest="_text_" maxChars="3000"/>
591a599
>     <copyField source="person2" dest="_text_" maxChars="3000"/>

I also made this related PR:

@pdurbin pdurbin removed their assignment Jan 31, 2025
@ofahimIQSS ofahimIQSS self-assigned this Feb 4, 2025
@ofahimIQSS
Copy link
Contributor

PR looking great. I followed the testing steps and did further regression testing. No issues found. Release snippet looks good as well.

image

Server log attached
server.log --- Testing PR 11136.txt

@ofahimIQSS ofahimIQSS merged commit aebec05 into IQSS:develop Feb 4, 2025
11 checks passed
@ofahimIQSS ofahimIQSS removed their assignment Feb 4, 2025
@pdurbin pdurbin added this to the 6.6 milestone Feb 4, 2025
@vera vera deleted the fix/9200-allowmultiples-for-deep-nesting branch February 5, 2025 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 15 FY25 Sprint 15 (2025-01-15 - 2025-01-29) FY25 Sprint 16 FY25 Sprint 16 (2025-01-29 - 2025-02-12) Size: 3 A percentage of a sprint. 2.1 hours.
Projects
Status: Done 🧹
Development

Successfully merging this pull request may close these issues.

5 participants