Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with classification() #783

Closed
jordancasey opened this issue Nov 7, 2019 · 12 comments
Closed

Error with classification() #783

jordancasey opened this issue Nov 7, 2019 · 12 comments
Labels
Milestone

Comments

@jordancasey
Copy link

When I run the classification() function, I get an error, approximately 20% of the time:

Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) :
Error in the HTTP2 framing layer

I tried to fix this using:

httr::set_config(httr::config(http_version = 0))

I also tried to specify NCBI:

classif <- classification(t, db="ncbi")

None of those fixes work. Any thoughts on how to fix this error?

Here's a reproducible example (again, the error message only pops up ~20% of the time, and it is unrelated to the queried taxa):

classification("Teleostei", db = "ncbi")

Here's my session info:

sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
[1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
[5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 LC_PAPER=fr_FR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] httr_1.4.1 curl_4.2 stringr_1.4.0 rentrez_1.2.2 taxize_0.9.9 purrr_0.3.3 dplyr_0.8.3

loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 pillar_1.4.2 compiler_3.6.1 plyr_1.8.4 iterators_1.0.12 tools_3.6.1
[7] jsonlite_1.6 tibble_2.1.3 nlme_3.1-141 lattice_0.20-38 pkgconfig_2.0.3 rlang_0.4.1
[13] foreach_1.4.7 cli_1.1.0 crul_0.9.0 parallel_3.6.1 xml2_1.2.2 triebeard_0.3.0
[19] grid_3.6.1 tidyselect_0.2.5 reshape_0.8.8 glue_1.3.1 httpcode_0.2.0 data.table_1.12.6
[25] R6_2.4.0 XML_3.98-1.20 reshape2_1.4.3 magrittr_1.5 urltools_1.7.3 codetools_0.2-16
[31] assertthat_0.2.1 bold_0.9.0 ape_5.3 stringi_1.4.3 crayon_1.3.4 zoo_1.8-6

Thanks!

@sckott
Copy link
Contributor

sckott commented Nov 7, 2019

thanks for the report @jordancasey !

I tried to fix this using: httr::set_config(httr::config(http_version = 0))

that wouldn't work because the underlying http package is https://github.com/ropensci/crul - you can achieve the same thing e.g., like classification("Teleostei", db = "ncbi", http_version = 0)

it may have been fixed in the latest version of curl, but for now the only thing we can try is opting out of http/2, and that's what the curl maintainer suggested too, try

classification("Teleostei", db = "ncbi", http_version = 0L, verbose = TRUE)

with verbose=TRUE you get verbose curl output, and you can see what HTTP version was used in the request, e.g.,

> GET /entrez/eutils/esearch.fcgi?db=taxonomy&term=Teleostei HTTP/1.1 <== HERE
Host: eutils.ncbi.nlm.nih.gov

And when you don't force to http/1, you would see:

> GET /entrez/eutils/esearch.fcgi?db=taxonomy&term=Teleostei HTTP/2 <== HERE
Host: eutils.ncbi.nlm.nih.gov

I was able to replicate that framing layer error on my macos when using http/2 at least once, so getting the same thing sometimes as you

@jordancasey
Copy link
Author

jordancasey commented Nov 8, 2019

Hi @sckott - thanks for your reply!

I've tried to force http/1, which works sometimes, but other times it still reverts to http/2 and fails.

I'm trying to use taxize to automate filling in the taxonomy of a list of 16,000 taxa, which is why it's problematic when it fails sometimes.

When I try to use http_version = 0L, the failed output is:

> classification("Teleostei", db = "ncbi", http_version = 0L, verbose = TRUE)
══  1 queries  ═══════════════

Retrieving data for taxon 'Teleostei'

* Found bundle for host eutils.ncbi.nlm.nih.gov: 0x101dca350 [can multiplex]
* Re-using existing connection! (#186) with host eutils.ncbi.nlm.nih.gov
* Connected to eutils.ncbi.nlm.nih.gov (2607:f220:41e:4290::110) port 443 (#186)
* Using Stream ID: 5 (easy handle 0x10763f800)
> GET /entrez/eutils/esearch.fcgi?db=taxonomy&term=Teleostei&api_key=secret HTTP/2
Host: eutils.ncbi.nlm.nih.gov
Accept-Encoding: gzip, deflate
Accept: application/json, text/xml, application/xml, */*
User-Agent: r-curl/4.2 crul/0.8.4 rOpenSci(taxize/0.9.9)
X-USER-AGENT: r-curl/4.2 crul/0.8.4 rOpenSci(taxize/0.9.9)

< HTTP/2 200 
< date: Fri, 08 Nov 2019 10:10:34 GMT
< server: Finatra
< strict-transport-security: max-age=31536000; includeSubDomains; preload
< content-security-policy: upgrade-insecure-requests
< x-ratelimit-remaining: 9
< ncbi-phid: 322C591853035CF50000299F6251D536.1.1.m_1
< cache-control: private
< ncbi-sid: 2430B08F031608C6_9C20SID
< content-encoding: gzip
< x-ratelimit-limit: 10
< access-control-allow-origin: *
< content-type: text/xml; charset=UTF-8
* Added cookie ncbi_sid="2430B08F031608C6_9C20SID" for domain nih.gov, path /, expire 1604830235
< set-cookie: ncbi_sid=2430B08F031608C6_9C20SID; domain=.nih.gov; path=/; expires=Sun, 08 Nov 2020 10:10:35 GMT
< x-ua-compatible: IE=Edge
< x-xss-protection: 1; mode=block
< 
* Connection #186 to host eutils.ncbi.nlm.nih.gov left intactFound:  Teleostei
══  Results  ═════════════════

● Total: 1Found: 1Not Found: 0
Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) : 
  Error in the HTTP2 framing layer

When I try to use http_version = 1.1, the failed output is:

> classification("Teleostei", db = "ncbi", http_version = 1.1, verbose = TRUE)
══  1 queries  ═══════════════

Retrieving data for taxon 'Teleostei'

* Found bundle for host eutils.ncbi.nlm.nih.gov: 0x101ddd190 [can multiplex]
* Re-using existing connection! (#180) with host eutils.ncbi.nlm.nih.gov
* Connected to eutils.ncbi.nlm.nih.gov (130.14.29.110) port 443 (#180)
* Using Stream ID: 3 (easy handle 0x114a5ea00)
> GET /entrez/eutils/esearch.fcgi?db=taxonomy&term=Teleostei&api_key=secret HTTP/2
Host: eutils.ncbi.nlm.nih.gov
Accept-Encoding: gzip, deflate
Accept: application/json, text/xml, application/xml, */*
User-Agent: r-curl/4.2 crul/0.8.4 rOpenSci(taxize/0.9.9)
X-USER-AGENT: r-curl/4.2 crul/0.8.4 rOpenSci(taxize/0.9.9)

< HTTP/2 200 
< date: Fri, 08 Nov 2019 10:03:35 GMT
< server: Finatra
< strict-transport-security: max-age=31536000; includeSubDomains; preload
< content-security-policy: upgrade-insecure-requests
< x-ratelimit-remaining: 9
< ncbi-phid: D0BD50AE07953D85000051BCA3232334.1.1.m_1
< cache-control: private
< ncbi-sid: 3A1D96765A95B8F0_4A9BSID
< content-encoding: gzip
< x-ratelimit-limit: 10
< access-control-allow-origin: *
< content-type: text/xml; charset=UTF-8
* Added cookie ncbi_sid="3A1D96765A95B8F0_4A9BSID" for domain nih.gov, path /, expire 1604829816
< set-cookie: ncbi_sid=3A1D96765A95B8F0_4A9BSID; domain=.nih.gov; path=/; expires=Sun, 08 Nov 2020 10:03:36 GMT
< x-ua-compatible: IE=Edge
< x-xss-protection: 1; mode=block
< 
* Connection #180 to host eutils.ncbi.nlm.nih.gov left intactFound:  Teleostei
══  Results  ═════════════════

● Total: 1Found: 1Not Found: 0
Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) : 
  Error in the HTTP2 framing layer

Even when it works with http_version = 0L, the http version is reported as HTTP/2.0.

> classification("Teleostei", db = "ncbi", http_version = 0L, verbose = TRUE)
> GET /entrez/eutils/esearch.fcgi?db=taxonomy&term=Teleostei&api_key=secret HTTP/2

When it works with http_version = 1.1, the http version is reported as HTTP/1.0.

> classification("Teleostei", db = "ncbi", http_version = 1.1, verbose = TRUE)
> GET /entrez/eutils/esearch.fcgi?db=taxonomy&term=Teleostei&api_key=secret HTTP/1.0
Host: eutils.ncbi.nlm.nih.gov

I'm not sure whether that's relevant?

Also, I'm working on a different system today (although I get the same errors on the system I used yesterday):

Session Info
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] taxize_0.9.9 purrr_0.3.2  dplyr_0.8.3 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2        pillar_1.4.2      compiler_3.6.1    plyr_1.8.4        iterators_1.0.12 
 [6] tools_3.6.1       jsonlite_1.6      tibble_2.1.3      nlme_3.1-141      lattice_0.20-38  
[11] pkgconfig_2.0.3   rlang_0.4.1       foreach_1.4.7     cli_1.1.0         rstudioapi_0.10  
[16] crul_0.8.4        curl_4.2          yaml_2.2.0        parallel_3.6.1    httr_1.4.1       
[21] stringr_1.4.0     xml2_1.2.2        triebeard_0.3.0   grid_3.6.1        tidyselect_0.2.5 
[26] reshape_0.8.8     glue_1.3.1        httpcode_0.2.0    data.table_1.12.6 R6_2.4.0         
[31] reshape2_1.4.3    magrittr_1.5      urltools_1.7.3    codetools_0.2-16  assertthat_0.2.1 
[36] bold_0.9.0        ape_5.3           stringi_1.4.3     crayon_1.3.4      zoo_1.8-6    

Thanks for any advice!

@sckott
Copy link
Contributor

sckott commented Nov 8, 2019

(hope you don't mind, i edited your reply to make it easier to see the code chunks; and I replaced your api key with secret - you don't have to, but it's a good idea to get a new API key as the one you shared here is on the open internet now :( )

i'm guessing 1.1 isn't a valid value to pass to http_version, but don't know for sure. more comments soon

@sckott
Copy link
Contributor

sckott commented Nov 8, 2019

On a related note, with recent versions of taxize, the get_*() functions, which you are using internally in classification() when you pass in a name instead of a taxon ID, now keep track of the queries and allow you to start off where you stopped if there is an error (as long as R doesn't crash). See ?taxon-state in R when taxize is loaded. You could approach it as using get_uid (if you're using NCBI) to get your ids, then once those are all sorted out, pass those to classification and you won't have to worry about the interactive name prompt at least. This doesn't solve the http framing layer error per se, but allows you to keep going without having to start over when an error does happen

@sckott
Copy link
Contributor

sckott commented Nov 8, 2019

@jordancasey Try http_version = 2L - that should force to http/1.1 - let me know if that works

@jordancasey
Copy link
Author

Hi @sckott - thanks for editing my code & api key (oops!)

When I use http_version = 2L, it works sometimes, but like before, sometimes it reverts to http/2 and fails. This is the first time that it successfully forces http/1.1 sometimes, so at least there's progress in the right direction.

Output when it works:

> GET /entrez/eutils/esearch.fcgi?db=taxonomy&term=Teleostei&api_key=secret HTTP/1.1
Host: eutils.ncbi.nlm.nih.gov

Output when it fails:

> GET /entrez/eutils/esearch.fcgi?db=taxonomy&term=Teleostei&api_key=secret HTTP/2
Host: eutils.ncbi.nlm.nih.gov

 Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) : 
  Error in the HTTP2 framing layer 

However, I did manage to run my script on a colleague's computer, without even having to specify http version. It always used http/1.1 automatically.

Here's her Session Info:

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=fr_FR.UTF-8       LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] taxize_0.9.9 purrr_0.3.3  dplyr_0.8.3

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2        pillar_1.4.2      compiler_3.6.1    plyr_1.8.4        iterators_1.0.12  tools_3.6.1      
 [7] jsonlite_1.6      tibble_2.1.3      nlme_3.1-139      lattice_0.20-38   pkgconfig_2.0.3   rlang_0.4.1      
[13] foreach_1.4.7     cli_1.1.0         rstudioapi_0.10   crul_0.8.4        curl_4.2          yaml_2.2.0      
[19] parallel_3.6.1    stringr_1.4.0     xml2_1.2.2        grid_3.6.1        tidyselect_0.2.5  reshape_0.8.8    
[25] glue_1.3.1        httpcode_0.2.0    data.table_1.12.2 R6_2.4.0          reshape2_1.4.3    magrittr_1.5    
[31] codetools_0.2-16  assertthat_0.2.1  bold_0.9.0        ape_5.3           stringi_1.4.3     crayon_1.3.4    
[37] zoo_1.8-5

@sckott
Copy link
Contributor

sckott commented Nov 13, 2019

thanks @jordancasey - glad there's progress.

I'm considering hard-coding forcing to http 1.1 for NCBI requests throughout the package. I'll ping you soon

@sckott sckott added this to the v0.9.91 milestone Nov 13, 2019
@sckott sckott added the bug label Nov 13, 2019
@sckott sckott closed this as completed in c03b48d Nov 13, 2019
@sckott
Copy link
Contributor

sckott commented Nov 13, 2019

it's hard coded now to always do http 1.1 requests for all ncbi requests across the pkg, let me know if you still have problems

@jordancasey
Copy link
Author

Hi Scott, Thanks for continuing to work on this. I've updated to v0.9.91:

other attached packages:
[1] taxize_0.9.91 purrr_0.3.3   dplyr_0.8.3

Unfortunately, it sometimes still reverts to http/2 (I've also run this without specifying http_version with the same results). Here's a successful run followed by a failed run:

> classification("Teleostei", db = "ncbi", http_version = 2L, verbose = TRUE)
══  1 queries  ═══════════════

Retrieving data for taxon 'Teleostei'

* Found bundle for host eutils.ncbi.nlm.nih.gov: 0x101bdf250 [can pipeline]
* Re-using existing connection! (#11) with host eutils.ncbi.nlm.nih.gov
* Connected to eutils.ncbi.nlm.nih.gov (2607:f220:41e:4290::110) port 443 (#11)
> GET /entrez/eutils/esearch.fcgi?db=taxonomy&term=Teleostei&api_key=secret HTTP/1.1
Host: eutils.ncbi.nlm.nih.gov
Accept-Encoding: gzip, deflate
Accept: application/json, text/xml, application/xml, */*
User-Agent: r-curl/4.2 crul/0.8.4 rOpenSci(taxize/0.9.9)
X-USER-AGENT: r-curl/4.2 crul/0.8.4 rOpenSci(taxize/0.9.9)

< HTTP/1.1 200 OK
< Date: Thu, 14 Nov 2019 10:39:45 GMT
< Server: Finatra
< Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
< Content-Security-Policy: upgrade-insecure-requests
< X-RateLimit-Remaining: 9
< NCBI-PHID: 322C40C6E58C0D7500003AAA619303E3.1.1.m_1
< Cache-Control: private
< NCBI-SID: 7DDBF536B1B14BD6_CF49SID
< content-encoding: gzip
< X-RateLimit-Limit: 10
< Access-Control-Allow-Origin: *
< Content-Type: text/xml; charset=UTF-8
* Added cookie ncbi_sid="7DDBF536B1B14BD6_CF49SID" for domain nih.gov, path /, expire 1605350385
< Set-Cookie: ncbi_sid=7DDBF536B1B14BD6_CF49SID; domain=.nih.gov; path=/; expires=Sat, 14 Nov 2020 10:39:45 GMT
< X-UA-Compatible: IE=Edge
< X-XSS-Protection: 1; mode=block
< Connection: close
< Transfer-Encoding: chunked
< 
* Closing connection 11
✔  Found:  Teleostei
══  Results  ═════════════════

● Total: 1 
● Found: 1 
● Not Found: 0
$Teleostei
                 name         rank     id
1  cellular organisms      no rank 131567
2           Eukaryota superkingdom   2759
3        Opisthokonta      no rank  33154
4             Metazoa      kingdom  33208
5           Eumetazoa      no rank   6072
6           Bilateria      no rank  33213
7       Deuterostomia      no rank  33511
8            Chordata       phylum   7711
9            Craniata    subphylum  89593
10         Vertebrata      no rank   7742
11      Gnathostomata      no rank   7776
12         Teleostomi      no rank 117570
13       Euteleostomi      no rank 117571
14     Actinopterygii   superclass   7898
15        Actinopteri        class 186623
16        Neopterygii     subclass  41665
17          Teleostei   infraclass  32443

attr(,"class")
[1] "classification"
attr(,"db")
[1] "ncbi"
> classification("Teleostei", db = "ncbi", http_version = 2L, verbose = TRUE)
══  1 queries  ═══════════════

Retrieving data for taxon 'Teleostei'

* Found bundle for host eutils.ncbi.nlm.nih.gov: 0x101cb9bc0 [can multiplex]
* Re-using existing connection! (#12) with host eutils.ncbi.nlm.nih.gov
* Connected to eutils.ncbi.nlm.nih.gov (130.14.29.110) port 443 (#12)
* Using Stream ID: 3 (easy handle 0x10e26fa00)
> GET /entrez/eutils/esearch.fcgi?db=taxonomy&term=Teleostei&api_key=secret HTTP/2
Host: eutils.ncbi.nlm.nih.gov
Accept-Encoding: gzip, deflate
Accept: application/json, text/xml, application/xml, */*
User-Agent: r-curl/4.2 crul/0.8.4 rOpenSci(taxize/0.9.9)
X-USER-AGENT: r-curl/4.2 crul/0.8.4 rOpenSci(taxize/0.9.9)

< HTTP/2 200 
< date: Thu, 14 Nov 2019 10:39:47 GMT
< server: Finatra
< strict-transport-security: max-age=31536000; includeSubDomains; preload
< content-security-policy: upgrade-insecure-requests
< x-ratelimit-remaining: 9
< ncbi-phid: D0BD6B6D0E30E6D5000028CD2392F682.1.1.m_1
< cache-control: private
< ncbi-sid: A9FC190898BD17EB_1069SID
< content-encoding: gzip
< x-ratelimit-limit: 10
< access-control-allow-origin: *
< content-type: text/xml; charset=UTF-8
* Added cookie ncbi_sid="A9FC190898BD17EB_1069SID" for domain nih.gov, path /, expire 1605350387
< set-cookie: ncbi_sid=A9FC190898BD17EB_1069SID; domain=.nih.gov; path=/; expires=Sat, 14 Nov 2020 10:39:47 GMT
< x-ua-compatible: IE=Edge
< x-xss-protection: 1; mode=block
< 
* Connection #12 to host eutils.ncbi.nlm.nih.gov left intact
✔  Found:  Teleostei
══  Results  ═════════════════

● Total: 1 
● Found: 1 
● Not Found: 0
Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) : 
  Error in the HTTP2 framing layer

@sckott
Copy link
Contributor

sckott commented Nov 14, 2019

hmm, the user agent string shows that you are still using taxize v0.9.9 - did you restart your R session? what does packageVersion("taxize") give you

@jordancasey
Copy link
Author

indeed, R just needed a proper restart. It's working perfectly now! Thanks, Scott!

@sckott
Copy link
Contributor

sckott commented Nov 14, 2019

great, glad it works. to be clear, http_version=2L is already in the ncbi requests, so you don't need to pass it in anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants