Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doi, "start" parameter using the Search API #4377

Closed
cscn opened this issue Dec 11, 2017 · 3 comments
Closed

doi, "start" parameter using the Search API #4377

cscn opened this issue Dec 11, 2017 · 3 comments

Comments

@cscn
Copy link

cscn commented Dec 11, 2017

I'm attempting to use the dataverse search API to find the doi of all R files in a dataverse server, and I've run into a couple of interesting behaviors. I'm using ".R" as my query and "file" as the type parameter.

  1. The API response to a get request at https://dataverse.harvard.edu/api/search/ (with search parameters) returns a response containing a "total_count" field within the "data" field. However, specifying a "start" parameter in the search beyond the expected number of pages continue to yield 10 results. Therefore, how would I determine the appropriate number of pages to get data for?
  2. The search API doesn't return a special DOI field for each result. Instead, the DOI is part of a string result that is returned in the "dataset_citation" field, e.g.,
    ''Marquez, Javier, 2014, "MORENA (Parte 2): El efecto de spoiler", doi:10.7910/DVN/27462, Harvard Dataverse, V1'
    I've been working around this by parsing out the DOI using a regex, but it would be super nice to have a it as a dedicated field.

Also would love suggestions on how to scrape all R-file-containing datasets from dataverse. Currently my process is to find the DOI's of all datasets containing .R files, find the file_ids of all files in each dataset, then download files by file_id using the data access API. I originally posted this in the R client repo IQSS/dataverse-client-r#19 (comment). Apologies for the pseudo-repost.

@pdurbin
Copy link
Member

pdurbin commented Dec 11, 2017

@cscn thanks for opening this issue. Can you please see if https://dataverse.harvard.edu/api/search?q=fileContentType%3Atype%2Fx-r-syntax helps you find the R files? I mentioned something similar in #3597

I haven't yet looked into the bug you're reporting about expected number of pages.

@cscn
Copy link
Author

cscn commented Dec 13, 2017

Thanks so much! This is exactly what I was looking for. It turns out the number of pages wasn't a bug at all, but rather a misinterpretation of the start parameter. The search API documentation states that start parameter is

A cursor for paging through search results. See iteration example.

This led me to believe that the parameter was essentially a page number, when in fact it refers to the result number at which to return a page of results. The example in the documentation demonstrates this, but it may be helpful to clarify the definition.

@pdurbin
Copy link
Member

pdurbin commented Dec 13, 2017

@cscn I'm glad you're immediate problem is solved. I just flagged this issue to be about improving the API Guide. Are you interested in trying to come up with a different phrasing? The file to edit is https://github.com/IQSS/dataverse/blob/develop/doc/sphinx-guides/source/api/search.rst and I'd be happy to review a pull request. No pressure. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants