-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spike: What work has already been done towards support for controlled vocabularies for metadata fields #8571
Comments
Brief summary: There's a good chance that Dataverse offers sufficient support for controlled vocabularies to achieve the goals of the NIH grant without much development/coding work being necessary. Meaning that most of the work needed will concern defining the actual metadata standards and the Controlled Vocabulary Values. Some javascript coding may be necessary if we end up using External Vocabulary mechanism for importing the CVV on the Dataverse side. Support for Controlled Vocabularies in the Dataverse software In order to use CVVs in Dataverse metadata fields the CV needs to be defined and imported in one of 2 supported ways:
The grant document linked and quoted in the description of the spike, above, says explicitly "integrate with external services". This implies that we will be using the solution 2. above for integrating with these CVVs from "standardized, widely used data dictionaries". However, we should keep in mind that there is also a possibility of achieving this integration using the standard built-in mechanism 1. - by creating a Metadata block (or expanding the existing Biomedical block) and defining the CVVs as part of it; perhaps providing some scripted solution for retrieving the dictionaries from external sources and encoding them as standard Dataverse block definition files. This would be a matter of certain specifics of the dictionaries and definitions in question, how large the vocabulary is, in what format it is served remotely and how often we should expect it to change. (There's some discussion of this in the Metadata Customization guide above. If using the External Service solution is a fixed decision that has already been made, we can skip this step). As far as "what's next" is concerned, the most logical next step appears to be this (again, quoting the item 4. under "Aim 2" in the grant description):
I.e. we need a better idea of the actual metadata specifications that we will need to support; and/or how these definitions will be served to us externally. The support for External Vocabulary Services is implemented in part by supplying Javascript code that interfaces with the remote provider and assists with populating the metadata edit forms in the Dataverse UI. Scripts that support SKOMOS and ORCIDs are provided as standard in the dedicated repository. Scripts supporting extra protocols may be available in the same repository, supplied by the Dataverse dev. community. If Dataverse needs to integrate with remote CVs served via a protocol not yet supported, more custom scripts will need to be developed. Hence figuring out these details is the next logical step of the effort. |
For the "demonstration of what is found to be implemented already in dataverse" item on the checkbox list (that I missed earlier):
(Thinking about it, this list of valid language names from the last example would also be a prime use case for being supplied by an External Service. Pulling it directly from the ISO-639 definition page, for example. This would eliminate us as the middleman having to maintain/replicate the list in our own block distribution... But this is of course entirely outside the scope of this spike issue.) |
@scolapasta OK, great. I left the checkboxes, under the "definition of done", un-checked; I wanted a reviewer to do the clicking, if they were satisfied with what I wrote. I can do it now; not that it matters much, just being thorough/ocd. The only thing I could think of adding to the above: there are some subtle differences in behavior between locally imported and external CVs. (different meaning of "controlled" really). This could be an area where some dev. effort would be needed. If, for example, we were to use the external model for the GREI vocabularies, but wanted them to function 1:1 like fixed CVs defined in metadata blocks. But, again, we need to know more about the actual metadata standards and vocabularies we will be working with in order to discuss that. |
@landreev I went ahead and checked them. |
This is in support of:
The first step is to figure out what has already been done by the dataverse team and by the community towards this aim. The focus here is on the general area of controlled vocabularies as opposed to specific biomedical vocabularies
For example:
And then to figure out what the next steps are.
Def of done
As completely as is reasonably possible in a 2 week period (sprint):
Search out previous related work that has been done by the Harvard Dataverse team
Search out previous work done within the community
demonstration of what is found to be implemented already in dataverse.
Define what's next
Aim 2:
Increase support for biomedical and cross-domain metadata standards and controlled vocabularies
One of the useful characteristics of the Dataverse open-source software is its extensive support for metadata standards and additional custom metadata. The standards currently supported include the Data Documentation Initiative (DDI), Dublin Core, DataCite, and Schema.org.
In particular, DDI makes a Dataverse repository interoperable even at the variable/attribute level since it supports variable descriptive and statistical metadata. This allows data exploration and analysis tools to integrate easily with the repository and discovery engines to find variable information.
In this project, we propose to
Related documents
The text was updated successfully, but these errors were encountered: