Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataverse integration #1

Open
pdurbin opened this issue Oct 23, 2019 · 12 comments
Open

Dataverse integration #1

pdurbin opened this issue Oct 23, 2019 · 12 comments

Comments

@pdurbin
Copy link

pdurbin commented Oct 23, 2019

Hi! I'm here because of https://twitter.com/GenevievMichaud/status/1186933912255291392 and DMap looks interesting! Let's talk about the possibility of integration with Dataverse, possibly through through an external tool! Please see http://guides.dataverse.org/en/4.17/api/external-tools.html

Also, I'd be remiss if I didn't point out that it should be "Dataverse" instead of "DataVerse" like in the screenshot below:

Screen Shot 2019-10-23 at 6 02 21 AM

Thanks!

@oblassers
Copy link
Owner

Dear Philip,

thank you for your interest in DMap and my apologies for the late response! DMap is a research prototype and demo tool developed to explore use cases of machine-actionable DMPs. Repository integration is an obvious use case for machine-actionable DMPs. To allow an information exchange between DMPs and RDM systems, a common data model for DMPs is developed by the RDA DMP Common Standards WG (see https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard).

At my university, prototypes were developed in this direction. I would like to point you to https://hido1994.github.io/madmp/ which shows an integration with Dataverse. Information specified in a DMP (e.g. license, dataset name etc.) can be used to populate metadata fields in the repository system. On the other hand a PID assigned to a dataset by the repository could flow back into the DMP.
Another use case for repository integration could be to inform repositories about planned deposits (type, amount, ...) so they can inform researchers about curating the data and select suitable metadata standards, so the transition to the repository at the end of a project becomes smoother. Use cases of maDMPs, collected by the community, are described here https://doi.org/10.3897/rio.3.e13086 and principles to realise the maDMP vision can be found here https://doi.org/10.1371/journal.pcbi.1006750

DMap currently focuses on providing support to researchers by automating the workflow of creating a DMP and exports a DMP in the RDA DMP Common Standard format (JSON). Any RDM service supporting the common data model could help in making systems better integrated.

I'm happy to further discuss Dataverse integration.

Best wishes,
Simon

PS. Also, thanks for pointing out the misspelling of Dataverse. Unfortunately I cannot change it because it is controlled vocabulary specified in the re3data schema (http://doi.org/10.2312/re3.007).

@pdurbin
Copy link
Author

pdurbin commented Oct 29, 2019

@oblassers your thoughtful reply was well worth the tiny wait! Thank you.

I first heard about RDA-DMP-Common-Standard from @TomMiksa at IQSS/dataverse#5859 (comment) and passed that information along at https://groups.google.com/d/msg/dataverse-community/JJB33tqykrI/LJUvgnevAgAJ which in my mind is the main thread on the Dataverse mailing list where people are talking about DMP.

From a quick look at https://hido1994.github.io/madmp/ I have a few observations:

workflow

@mercecrosas
Copy link

The integration with DMap can be very useful for Dataverse users who need a DMP. Would this mean that Dataverse would have a metadata block to support the DMP machine-actionable metadata?

@TomMiksa
Copy link

Hi all,

I am really happy to see that you're interested in our work.

Few clarifications from my side to make sure we're one the same page:

  • https://hido1994.github.io/madmp/ - @Hido1994 considered here two scenarios:
    (1) DMP was created first, e.g. with a use of a tool like DMAP, and the information it contains was used to automate the upload of data into Dataverse;
    (2) Data is already in Dataverse and a researcher needs to update his/her DMP. He/she can export from Dataverse relevant information into the maDMP.
    Thus, we wanted to show how we can assist researchers at different stages of the research data lifecycle: from planning/proposal phase to reporting/end of project phase.

  • https://github.com/oblassers/dmap - DMAP is a tool developed by @oblassers based on interactive mock-ups (https://oblassers.github.io/dmap-mockups/) which in turn a result of a consulation within the RDA DMP Common Standards WG and interviews with researchers at the TU Wien. The primary goal of this tool, like explained by Simon, is to reduce the amount of questions asked to researchers and maximise the reuse of information from existing systems, e.g. databases with information projects, publcitions, empolyees, etc

Thus, we have two tools, doing different things, but each of them being an important part of the RDM ecosystem around dataverse.

In my opinion, exchange of information between repositories and maDMPs is one of the key use cases in which we can automate a lot, and thus bring a lot of benefits to both researchers and repository managers. I am happy to explore jointly further ideas and integrations!

Cheers,
Tomasz

@shlake
Copy link

shlake commented Oct 30, 2019

@pdurbin I've sent my re3Data peeps an email about correcting "Dataverse" in the "softwareNames" vocab

@mercecrosas
Copy link

mercecrosas commented Oct 30, 2019 via email

@oblassers
Copy link
Owner

@mercecrosas good question!
DMap is designed to store DMPs in its own database. If you want to make a DMP citable it could be useful to archive it in a repository and get a PID assigned to it. I think a DMP would not necessarily have to be archived together with the datasets since the DMP contains dataset entities which could point to any location.

As @TomMiksa pointed out and @Hido1994 's application shows it could be useful to have maDMP-friendly metadata in the repository and programmatic access to it which could be used to update a DMP.

@TomMiksa
Copy link

The DMP contains information on a dataset like: license, access mode (open/closed/shared), embargo period, etc. This information helps in ingesting the data into Dataverse and constitutes dataset's metadata. For example, a DMP states: "collection o JPEG images will be shared under CC0 license after 1 year embargo at the Harvard dataverse repository". When the files are uplodad to Dataverse, specific metadata fields are set in Dataverse and are presented on a landing page of the dataset: license, embargo, etc. For this reason, I believe there is no need to publish the JSON file together with the dataset.

Another use case to consider is publishing DMPs on their own. For example, a repository of DMPs that point to datasets. Imagine a situation in which a researcher would like to find out which projects used specific dataset within the last 2 years. I know that CDL and DataCite are investigating the idea of assigning DOIs to DMPs.

@oblassers
Copy link
Owner

Another use case to consider is publishing DMPs on their own. For example, a repository of DMPs that point to datasets. Imagine a situation in which a researcher would like to find out which projects used specific dataset within the last 2 years. I know that CDL and DataCite are investigating the idea of assigning DOIs to DMPs.

I know that @kjgarza is working on this.

@pdurbin
Copy link
Author

pdurbin commented Nov 1, 2019

@TomMiksa I see your paper was mentioned in this blog post from last week: https://researchdataq.org/editorials/the-boilerplate-problem-in-data-management-plans/

@TomMiksa
Copy link

Hi everyone,
I'm writing to let you know that we're organizing a hackathon in which we're trying out different integrations using maDMPs. Would you be interested in joining as a team?
It would be cool to have a team that would work on connecting maDMPs with Dataverse. We already have teams working on similar topics.

Here you can find details on the event, including teams that signed up so far:
https://github.com/RDA-DMP-Common/hackathon-2020

Cheers,
Tomasz

@pdurbin
Copy link
Author

pdurbin commented May 15, 2020

@TomMiksa you just reminded me to reach out to you to see if you (or others here) would like to participate in the session about external tools ( https://projects.iq.harvard.edu/dcm2020/breakout-sessions ) for the upcoming Dataverse conference (June 17-19). I sent you an email with more details. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants