-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix BufferedTokenizer to properly resume after a buffer full condition respecting the encoding of the input string #16968
base: main
Are you sure you want to change the base?
Conversation
69bd4f4
to
a84656e
Compare
It looks like this PR modifies one or more |
📃 DOCS PREVIEW ✨ https://logstash_bk_16968.docs-preview.app.elstc.co/diff |
…as the same encoding
…g and avoid implicit deconding in addAll iterator
…h data input encoding, to do not change encoding
b682c20
to
b42ca05
Compare
Quality Gate passedIssues Measures |
💚 Build Succeeded
History
cc @andsel |
Uncovered use casesThis is a bugfix on the original code to solve the problem to respect Check with the pipeline:
and a loding script as: require 'socket'
require 'json'
hostname = 'localhost'
port = 1234
socket = TCPSocket.open(hostname, port)
data = {"a" => "a"*10}.to_json + "\n" + {"b" => "b" * 105}.to_json; socket.write(data)
socket.close it produces an output like:
Ideal solutionTo solve this problem, the |
Release notes
[rn:skip]
What does this PR do?
This is a second take to fix the processing of tokens from the tokenizer after a buffer full error. The first try #16482 was rollbacked to the encoding error #16694.
The first try failed on returning the tokens in the same encoding of the input.
This PR does a couple of things:
concat
method instead ofaddAll
, which avoid to convert RubyString to String and back to RubyString. When return the headStringBuilder
it enforce the encoding with the input charset.Why is it important/What is the impact to the user?
Permit to use effectively the tokenizer also in context where a line is bigger than a limit.
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files (and/or docker env variables)Author's Checklist
How to test this PR locally
The test plan has two sides:
How to test the encoding is respected
Startup a REPL with Logstash and exercise the tokenizer:
or use the following script
with the Logstash run as
bin/logstash -e "input { tcp { port => 1234 codec => line { charset => 'ISO8859-1' } } } output { stdout { codec => rubydebug } }"
In the output the
£
as to be present and not£
Related issues
BufferedTokenizerExt
#16694