Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance input to api_get_provenance_metadata to accept urls and dois #17

Open
srearl opened this issue Jun 25, 2020 · 4 comments
Open
Labels
enhancement New feature or request

Comments

@srearl
Copy link

srearl commented Jun 25, 2020

api_get_provenance_metadata is a fantastic resource but I ran into a case where I needed to access provenance information but had the doi and/or url of the dataset rather than the project identifier (e.g., knb-lter-xxx.x.x). Below is an R-based MRE using a dataset from BNZ that I used to address this task but it seems that the utility of api_get_provenance_metadata would be increased if it would natively accept a dataset doi or url in addition to the project ### identifier.

MRE (in R):

library(rvest)
library(EDIutils)
library(EML)
library(dplyr)
library(stringr)

url <- "https://doi.org/10.6073/pasta/31b32868ddbb099c4b5480fb00eb2481"

landingPage <- read_html(url)

pageSubset <- landingPage %>%
  html_nodes(".no-list-style") %>%
  html_text()

packageId <- str_extract(grep("knb-lter-", pageSubset, value = TRUE)[[1]], "^\\S*")

packageProv <- emld::as_emld(EDIutils::api_get_provenance_metadata(packageId))
packageProv$`@context` <- NULL
packageProv$`@type` <- NULL

# desired output
packageProv 

@clnsmth
Copy link
Contributor

clnsmth commented Jun 25, 2020

Thanks for this suggestion @srearl! I agree that DOIs and URLs may be more common to users but I'm a little wary of adding (and maintaining) support for DOI and URL inputs to this function because:

1.) It creates a precedent for extending support to all other API functions
2.) URLs (if you mean data package URLs) these may change and break workflows
3.) Package ID is conspicuously listed on the data package landing page

Can you tell me more about your use case and why package IDs may be challenging for users?

@srearl
Copy link
Author

srearl commented Jun 25, 2020

Hi Colin,

  1. I do not know this package very well but can sympathize with this point.
  2. True and definitely a point to consider but they change very rarely (I think rarely, anyway). Perhaps a compromise here, if of interest to explore this further, would be to support DOIs but not URLs. In my case (below), I was provided mostly DOIs.
  3. Indeed. However, the reason that this became an issue for me is that I was provided ~30 DOIs (and a few URLs) - too many to be practical to visit each landing page and harvest the package ID. The MRE that I provided was pulled from a script that I used to loop over the list.

@clnsmth
Copy link
Contributor

clnsmth commented Jun 26, 2020

Agreed @srearl, manually parsing that list would be onerous. I'm moving this into the queue with the caveat that it should be implemented for all EDI API functions in this package.

@clnsmth clnsmth added the enhancement New feature or request label Jun 26, 2020
@clnsmth
Copy link
Contributor

clnsmth commented Jan 11, 2022

The least intrusive implementation here might be a mapping function that takes one of:

  • Data package identifier
  • Data package DOI
  • Data package URL

and returns the other two IDs, which can be passed to downstream functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants