-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrate with DataVerse #269
Comments
see also IQSS/dataverse#2038 (comment) |
curl -L "https://dataverse.harvard.edu/api/search?q=fileMd5:48a76222cf5c06cb4f2d8f75cc0caa63"\
| jq . yields: {
"status": "OK",
"data": {
"q": "fileMd5:48a76222cf5c06cb4f2d8f75cc0caa63",
"total_count": 1,
"start": 0,
"spelling_alternatives": {},
"items": [
{
"name": "Auter Fine PB Replication Code.txt",
"type": "file",
"url": "https://dataverse.harvard.edu/api/access/datafile/2829688",
"file_id": "2829688",
"published_at": "2016-05-18T17:57:24Z",
"file_type": "Plain Text",
"file_content_type": "text/plain",
"size_in_bytes": 2065,
"md5": "48a76222cf5c06cb4f2d8f75cc0caa63",
"checksum": {
"type": "MD5",
"value": "48a76222cf5c06cb4f2d8f75cc0caa63"
},
"file_persistent_id": "doi:10.7910/DVN/TGKZ2T/Y1ZZXT",
"dataset_name": "Replication Data for: \"Negative Campaigning in the Social Media Age: Attack Advertising on Facebook\"",
"dataset_id": "2829686",
"dataset_persistent_id": "doi:10.7910/DVN/TGKZ2T",
"dataset_citation": "Auter, Zachary, 2016, \"Replication Data for: \"Negative Campaigning in the Social Media Age: Attack Advertising on Facebook\"\", https://doi.org/10.7910/DVN/TGKZ2T, Harvard Dataverse, V1, UNF:6:LSx44nECMNQun46yUutUuA== [fileUNF]"
}
],
"count_in_response": 1
}
} |
with a list of endpoints available via -
yielding
|
where the installation info comes from a crowd sourced google sheet at - https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0 |
A first pass at integrating with the "DataVerse" should be available in the next upcoming Preston release. Example 1 - query against specific DataVerse endpointtime preston cat --remote https://dataverse.harvard.edu hash://md5/48a76222cf5c06cb4f2d8f75cc0caa63 | head yielded
and took
Example 2: query against all registered dataverse endpointsusing the "magic" host - dataverse.org , Preston'll try to find all registered dataverse endpoints and ask them for some content. time preston cat --remote https://dataverse.org hash://md5/48a76222cf5c06cb4f2d8f75cc0caa63 | head yields
and took
Note that Example 2 may take a while to complete, because Preston goes down a list of about 100 servers is queried until one of them claims to have the content. Optimization may help to reduce the response time if needed. |
Really cool! is dataverse only md5 based? |
Not sure, but DataVerse sure looks like a MD5-verse all over, and see IQSS/dataverse#3354 and gdcc/dataverse-kubernetes#68 (comment) |
after fixing #270 , the following screenshot was created for content rendered via: https://linker.bio/hash://md5/7d62417b5b689ed91dcd25f10c9c2132 |
@jhpoelen fun! I just started a thread in our chat about Preston. Please feel free to join in. If you'd like to present at a community call or record something for DataverseTV, please let me know! Oh, in ea7f9b5 I see you noticed the Dataverse installation is Maine is behaving differently API-wise. This is because it's running an old version of Dataverse (pre-4.x). |
@pdurbin happy to present at a community call. Please let me know when, and I'll try and make room in my schedule. |
@jhpoelen great! For now I added you to our planning doc for Feb 6. Thanks! |
@jhpoelen Happy New Year! Are you still interested in presenting at the Dataverse community call on Feb 6th? It's at 10am eastern time. |
@pdurbin presenting to your Dataverse community at 2024-02-06 at 10am eastern sounds like fun! Anything in particular you are interested in? Do you need some abstract / bio for announcement? |
@jhpoelen we aren't very formal. I just updated https://dataverse.org/community-calls to say that you'll talk about how Preston was recently integrated with Dataverse. We often record these talks and put them on DataverseTV, but it's up to you. How much time would you like? 20 minutes? Plus time for Q&A? Thanks for your interest in talking about this integration! |
20 minutes plus time for Q&A sounds great! Looking forward to our discussions. |
@pdurbin Thanks for having me at the DataVerse Community meeting today. You can find the slides at: https://jhpoelen.nl/dataverse-talk-2024-02-06/#/title-slide and |
@jhpoelen thanks for a great presentation! I just announced that your talk is now on DataverseTV. For now I added a placeholder description but please feel free to suggest something better here or in the spreadsheet. |
@pdurbin Thanks again for the engaging conversation. Great to hear the different perspectives! For future reference, I've packaged (and signed) the slides, recording etc. in: Poelen, J. H. (2024, February 6). A DataVerse Beyond the Internet hash://md5/e34b50213fc407892d0810dabd742b1f. Zenodo. https://doi.org/10.5281/zenodo.10626561 Can you please include this citation in the DataVerseTV page? |
Also @pdurbin how can I best cite DataVerse and the DataVerse Community call? |
Also, I noticed that the DOI in the recommended citation for:
no longer resolved soon after the presentation.
yielded:
Luckily content id related to their signed citation:
still yields some results via non-dataverse sources like Zenodo and linker.bio (see below). Great to have such a good example of the dynamic internet in action. Also, I wonder what the cat do to get ejected from the dataverse . . . e.g.,
and
|
@jhpoelen hi! I fixed up the DataverseTV description. For the rest, it looks like you also posted to https://groups.google.com/g/dataverse-community/c/-n0mXap9qjg/m/XslA_jpOAAAJ I'd rather not reply in two places. Is it ok if I pick one? 😄 Yeah, we need a better way to cite Dataverse itself. There's discussion about that here: In short, please use this: Gary King. 2007. “An Introduction to the Dataverse Network as an Infrastructure for Data Sharing.” Sociological Methods and Research, 36, Pp. 173–199. And sorry, there's no way to cite the community call. I guess I'd suggest linking to the notes: https://docs.google.com/document/d/1t0eY4mh2f2aH6yhnzfyXF9J05yUgr8A5aMDIMyuae80/edit?usp=sharing |
@pdurbin thanks for your update and for sharing the links. I've used your information to update the description of: Poelen, J. H. (2024, February 6). A DataVerse Beyond the Internet hash://md5/e34b50213fc407892d0810dabd742b1f. Zenodo. https://doi.org/10.5281/zenodo.10626561 Happy to take suggestions on how to better represent and cite the great work that you and your colleagues are doing . . . |
Just to document the 404 page generated by the Harvard Data Verse URL https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/24358/N4FCVS on 12 Feb 2024 |
see also https://groups.google.com/g/dataverse-community/c/-n0mXap9qjg/m/pfuk0IudAAAJ and cross-posted text below - Hi Data-nauts, Dataversians, (How do you call folks inhabiting the DataVerse?) Julian asked:
In my published slides and recorded talk of the 6 Feb 2024 dataverse community call: Poelen, J. H. (2024, February 6). A DataVerse Beyond the Internet hash://md5/e34b50213fc407892d0810dabd742b1f. Zenodo. https://doi.org/10.5281/zenodo.10626561 , I asked the questions (see also https://jhpoelen.nl/dataverse-talk-2024-02-06/#/guiding-questions):
and proceeded to take the Harvard Kitty citation as suggested by Harvard Data Verse (HDV):
And less than a week later (not 40/50 years later), the (aspirationally) "Persistent Identifier" (aPID) doi:10.7910/DVN/24358/N4FCVS minted by the HDV no longer resolves (see attached screenshot) as if the kitty never existed. https://doi.org/10.7910/DVN/24358/N4FCVS I know that this a sample size of N=1, but it does support my claim made later in the presentation (also see https://jhpoelen.nl/dataverse-talk-2024-02-06/#/how-to-retrieve-this-cat-picture-50-years-from-now):
Also, note that the signed citation (as proposed in my presentation): Joshua Carp, 2014, “cat.jpg”, CarpTest, https://doi.org/10.7910/DVN/24358/N4FCVS, Harvard Dataverse, V1 hash://md5/7d62417b5b689ed91dcd25f10c9c2132 Allows for retrieving the cat picture via their digital fingerprint hash://md5/7d62417b5b689ed91dcd25f10c9c2132 :
while leaving open other known, or as of yet unknown, methods to retrieve published digital data via their signature. I hope this message helps to support that the case of the lost Harvard Kitty provides evidence to support my claim that our current way of citing (and resolving) digital datasets may need a little work beyond including aPIDs to help carry our digital knowledge into the future. Curious to hear your thoughts, -jorrit PS. I've attached a copy of the Harvard Kitty just to have another place to be able to retrieve the cute 4.5MB cat picture. |
related to IQSS/dataverse#3436
The text was updated successfully, but these errors were encountered: