-
-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug?] unknown encoding ASCII-8BIT #553
Comments
It's [Bug #5126] and should have been backported to 1.9.3rc1. |
Ruby 1.9.3 RC1 is already released, and it includes the fix. |
Above code works fine with 1.9.3-rc1. But the following code fails with same output on both 1.9.2 and 1.9.3-rc1: require "nokogiri"
s = "ee"
s.force_encoding "ASCII-8BIT"
puts Nokogiri::HTML::DocumentFragment.parse(s).to_s.inspect |
Using ruby-1.9.3-p0, @bogdan's code reproduces the error as well. |
Yes i am using ruby-1.9.3-p0 and getting the same error. |
I think I'm getting similar errors during tests:
It's caused when LANG environment variable is set to something strange (like if you do builds and set LANG=C). |
My $LANG is set to |
i have the same error when parsing the bbc page
using the following code : require 'open-uri'
url = open('http://www.bbc.co.uk/arabic/business/2013/05/130513_us_obama_tax.shtml')
document = url.to_a.join ''
noko = Nokogiri::HTML.fragment(document)
noko.to_s it produces a whole lot of
any ideas how to solve it ?
|
Can't reproduce with 1.9.3p448 or jruby 1.7.6. I'll go ahead and close this issue. Please reopen it if you can still reproduce the bug. Cheers, |
I am still seeing the exact same problem as blazeeboy when running his test code. Ruby just installed via rvm on an Ubuntu machine:
and the same with a ruby 2.0.0 environment:
This is with nokogiri 1.6.0 (which seems to be the latest?). |
I also got this same output as part of a heroku log, but could not reproduce the issue in the console (but can reliably reproduce in the application). This is caused by a special character Using |
I'm also getting this on 2.0.0p353 - has anyone been able to find a fix? |
Just tried 2.0.0-p481. Same issue :/ |
Getting this on This happens when using |
Reopening. Will investigate. |
Closing, never could reproduce. Happy to re-open if someone can help me reproduce the problem. |
@flavorjones Reproducing the problem is not hard. require 'nokogiri'
s = String.new 'hello', encoding: Encoding::ASCII_8BIT
p xml: Nokogiri::XML::DocumentFragment.parse(s).to_s
#=> {:xml=>"hello"}
p html: Nokogiri::HTML::DocumentFragment.parse(s).to_s
#=> output error : unknown encoding ASCII-8BIT
#=> {:html=>""} Seems like |
@amatsuda Thank you for helping me reproduce. |
Any update available? |
I'm getting this only on Heroku on ruby 2.3.3. Is there any way to workaround it? @yagudaev did you find any more specifics about why it would just happen on Heroku? |
@pmackay this is reproducible in ruby 2.4.0. It's definitely not heroku-specific or 2.3.3-specific. |
It's also specific to HTML::DocumentFragment. A potential workaround for now is to use HTML::Document or else XML::DocumentFragment if possible. |
Since Is this happening upstream or within Nokogiri? |
I am using ruby 1.9.3-preview1 and nokogiri 1.5.0.
Outputs:
It is very strange that if empty string is inserted inline
"e#{""}"
- there is no error.libxml
2.7.8.dfsg-2ubuntu0.1
The text was updated successfully, but these errors were encountered: