Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codec json_lines initializes charset incorrectly causing it to be ignored #6

Closed
jordansissel opened this issue May 18, 2015 · 2 comments

Comments

@jordansissel
Copy link
Contributor

(This issue was originally filed by @fira at elastic/logstash#3065)


I've been trying to get NXLog with Logstash to work for days now, and i couldn't manage to get charsets straight.

What buggued me was that logstash would always complain about input "not being valid UTF-8" even when i set it to expect CP1252. Even stranger, some people said things broke starting 1.2 and that using line codec then json filter worked, as a workaround to json_lines. So i set out to put debug output everywhere in logstash code and see why.

After a little trip in codec code, then the config parser to confirm it was working there, then back to the codecs... I find this in line codec:

   @buffer.extract(data).each do |line|
      yield LogStash::Event.new("message" => @converter.convert(line))

@converter is declared in register:

    @converter = LogStash::Util::Charset.new(@charset)
    @converter.logger = @logger

Now, i can only imagine JSONLines to reuse JSON and Lines codecs. What do you think it does ? Overwrite converter or something ?

  public
  def initialize(params={})
    super(params)
    @lines = LogStash::Codecs::Line.new
    @lines.charset = @charset
  end

It seems to just change the initialization parameter, and for some reason this is ignored when/if register is ever called. I'll try to dig a bit deeper to see why...

Apparently this bug has been around for about two years, reported a bunch of times, and hasn't been fixed yet. Could someone do something about this ?

@rmourao
Copy link

rmourao commented Jul 9, 2015

Hi.. Any idea when this issue will be fixed?

@colinsurprenant
Copy link
Contributor

This was likely caused by piggybacking on the line codec which was messing the charset and is most probably solved now since #18 where we removed the dependency on the line codec.

Closing. Feel free to reopen if you believe this is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants