Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml_url does not work (at least in combination with base_url) #300

Closed
mwaldstein opened this issue Apr 15, 2020 · 2 comments
Closed

xml_url does not work (at least in combination with base_url) #300

mwaldstein opened this issue Apr 15, 2020 · 2 comments

Comments

@mwaldstein
Copy link

mwaldstein commented Apr 15, 2020

The behavior of xml_url appears to have changed pretty drastically in v1.3.0.

Previously, passing base_url via read_html resulted in it setting the url returned by xml_url

These are deeply simplified examples but highlight the change in behavior. I can work around the change from "NA" to "<CHARSXP: NA>", but the loss of the xml_url is a big loss which will require a bit of re-architecting.

This breaks edgarWebR (currently off CRAN due to vignettes making remote API calls)

Using string input

Example 1

require(xml2)
doc <- read_html("<html/>", base_url = "http://test.com")
xml_url(doc)

On v1.2.5 the output was "http://test.com"
On v1.3.1 the output is "UTF-8"

Example 2

require(xml2)
doc <- read_html("<html/>")
xml_url(doc)

On v1.2.5 the output was "NA"
Ov v1.3.0 the output is "UTF-8"

Using httr response

Example 3

require(xml2)
require(httr)
href <- "https://www.sec.gov/cgi-bin/cik_lookup?company=cloudera"
res <- GET(href)
doc <- read_html(res, base_url = href)
xml_url(doc)

On v1.2.5 the output was "https://www.sec.gov/cgi-bin/cik_lookup?company=cloudera"
Ov v1.3.0 the output is "<CHARSXP: NA>"

Example 4

require(xml2)
require(httr)
href <- "https://www.sec.gov/cgi-bin/cik_lookup?company=cloudera"
res <- GET(href)
doc <- read_html(res)
xml_url(doc)

On v1.2.5 the output was "NA"
Ov v1.3.0 the output is "<CHARSXP: NA>"

@jimhester
Copy link
Member

Thanks, it was a bug introduced when converting the code not to use Rcpp, should now be fixed. We will be doing another xml2 release in the near future. If you notice any additional regressions please let us know!

@mwaldstein
Copy link
Author

Thanks! Appreciate the responsiveness

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants