Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding user agent to enter the polite pool #173

Closed
poldham opened this issue Sep 13, 2018 · 7 comments · Fixed by #175
Closed

adding user agent to enter the polite pool #173

poldham opened this issue Sep 13, 2018 · 7 comments · Fixed by #175
Labels
Milestone

Comments

@poldham
Copy link
Contributor

poldham commented Sep 13, 2018

Hi Scott and collaborators,

I have a question regarding how to add the user agent in the headers to a request. We have a team in India who are attempting to run a large query (for publications on India) that should pull back around 285,000 results. However, this is timing out and reading the documentation on the problem with degradation on the public API it may be because a user agent is not being specified to place the request in the polite pool https://github.com/CrossRef/rest-api-doc. With httr that would look like: GET("https://api.crossref.org/works?query.author=richard+feynman", user_agent("poldham")) but I've been struggling to get that working in the rcrossref funs with either httr or crul.

It may be I am missing something really obvious, but it also struck me that given the issues facing the public crossref API maybe in future updates rcrossref could include the user_agent as an arg and encourage users into the polite pool. So that could may be be a possible enhancement. Apologies in advance if I have missed something obvious! All the best, Paul

@sckott
Copy link
Contributor

sckott commented Sep 13, 2018

thanks for your question @poldham

i assume this is being used in a shiny app? can you share the session info for the server it's running on?

@poldham
Copy link
Contributor Author

poldham commented Sep 13, 2018

Thanks @sckott
At the moment not aiming for shiny. Simply trying to pull the data back into a df as part of building an open literature repo. We would though want to pull in smaller update chunks at a later stage.

The session info is

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] crul_0.6.0 httr_1.3.1 rcrossref_0.8.4

loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 compiler_3.5.1 pillar_1.2.3 later_0.7.3
[5] plyr_1.8.4 bindr_0.1.1 tools_3.5.1 digest_0.6.15
[9] packrat_0.4.9-3 jsonlite_1.5 tibble_1.4.2 pkgconfig_2.0.1
[13] rlang_0.2.1 bibtex_0.4.2 shiny_1.1.0 curl_3.2
[17] bindrcpp_0.2.2 dplyr_0.7.6 stringr_1.3.1 knitr_1.20
[21] xml2_1.2.0 htmlwidgets_1.2 triebeard_0.3.0 DT_0.4
[25] tidyselect_0.2.4 httpcode_0.2.0 glue_1.2.0 R6_2.2.2
[29] purrr_0.2.5 magrittr_1.5 urltools_1.7.0 promises_1.0.1
[33] htmltools_0.3.6 assertthat_0.2.0 mime_0.5 xtable_1.8-2
[37] httpuv_1.4.4.2 stringi_1.2.3 miniUI_0.1.1.1

@sckott
Copy link
Contributor

sckott commented Sep 13, 2018

Its possible that this is a polite pool issue, can you have them do cr_works(limit = 1, verbose = TRUE) to see if their email is being used to direct them into polite pool - the HTTP headers will be printed to the console and should look like:

cr_works(limit = 1, verbose = TRUE)
> GET /works?rows=1 HTTP/1.1
Host: api.crossref.org
Accept-Encoding: gzip, deflate
Accept: application/json, text/xml, application/xml, */*
User-Agent: r-curl/3.2 crul/0.6.0 rOpenSci(rcrossref/0.8.4.9100) (mailto:myrmecocystus@gmail.com)
X-USER-AGENT: r-curl/3.2 crul/0.6.0 rOpenSci(rcrossref/0.8.4.9100) (mailto:myrmecocystus@gmail.com)

However, it may be that they aren't using the cursor for lots of results. Are they using the cursor argument? That is used when the user wants a lot of results back. there's examples in the docs.

@poldham
Copy link
Contributor Author

poldham commented Sep 13, 2018

Many thanks @sckott I have a call with the India team in the morning so will check then. But, running that on my machine I get:

GET /works?rows=1 HTTP/1.1
Host: api.crossref.org
Accept-Encoding: gzip, deflate
Accept: application/json, text/xml, application/xml, /
User-Agent: r-curl/3.2 crul/0.6.0 rOpenSci(rcrossref/0.8.4)
X-USER-AGENT: r-curl/3.2 crul/0.6.0 rOpenSci(rcrossref/0.8.4)

So the mailto element is not getting set. I'll just look at the docs again to cross check if I missed something on setting user agent.

@sckott
Copy link
Contributor

sckott commented Sep 13, 2018

i think you can see the docs in ?rcrossref-package

@poldham
Copy link
Contributor Author

poldham commented Sep 13, 2018

Many thanks Scott, I had a feeling I was missing something. That was exactly the bit of the documentation I hadn't read! Tomorrow I will write a giant poster sized note to myself to read all the documentation first! Thanks so much for your time and I will let you know when we move the India biodiversity repository along. Cursor issue is noted as well but I'm sure the team was using cursor and will cross check. All the best, Paul

@poldham poldham closed this as completed Sep 13, 2018
@sckott
Copy link
Contributor

sckott commented Sep 13, 2018

Glad its sorted out. We should link to the docs on polite pool from the fxns, will add that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants