Backport TCP window, client closing fixes to 6.0.x #50

yaauie · 2022-11-03T17:50:47Z

Because 6.1.0 changed this plugin's dependency on Logstash to require Logstash 8 and we still commercially support Logstash 7.13+, I have branched from 6.0.2 and am backporting relevant fixes to the 6.0.x series.

This is not typical SEMVER, and I would like to find a way to separately get 6.1.x of this plugin running on Logstash 7, even if it is without one or more of the features that were introduced for the 6.1 series of this plugin (e.g., works unless you try to use the new feature on LS7).

Here, we manually backport two main fixes, each in its own commit for clarity:

client-closing and logging standardization (subset of Feat: ssl_supported_protocols (TLSv1.3) #47 )
overlarge payload handling (Send entire payload considering IO#syswrite return size. #49)

We also:

cross-port the solution from Send entire payload considering IO#syswrite return size. #49 that handled overlarge payloads in client mode and applied them to server-mode.
add baseline specs

Unfortunately, in addition to breaking compatibility with LS 7.x, #47 also introduced two bugs that were fixed in #49 (repeating thread names, logger access), one more that is first-addressed here (crash while closing plugin), and will need to be forward-ported) so the lines are muddled.

I have elected to group whole fixes by commit as much as possible, instead of doing true backports.

yaauie

I added a minimal set of specs and found multiple additional bugs around shutdown sequencing.

They are all fixed here, and will need to be forward-ported to main/6.1.

yaauie · 2022-11-03T22:49:39Z

lib/logstash/outputs/tcp.rb

+    if server?
+      # server-mode clean-up
+      @closed.make_true
+      @server_socket.shutdown rescue nil if @server_socket


Will need to be forward-ported to main/6.1 series, which sends TCPServer#close instead of this TCPServer#shutdown, and causes the plugin to crash during shutdown because a blocking call on TCPServer#accept throws an IOError when the server is closed out of under it.

yaauie · 2022-11-03T22:51:02Z

lib/logstash/outputs/tcp.rb

+      end
+    else
+      # client-mode clean-up
+      @client_socket&.close


All related @client_socket bits will need to be forwarded to main/6.1-series. It cannot be a local variable in register because we need to be able to close it in our close method

yaauie · 2022-11-03T22:51:49Z

lib/logstash/outputs/tcp.rb


          # Now send the payload
-          client_socket.syswrite(payload) if w.any?
+          @logger.trace("transmitting #{payload.bytesize} bytes", host: @host, port: @port, socket: @client_socket&.peer) if @logger.trace? && payload && !payload.empty?


logging should be forward-ported to main/6.1 series

yaauie · 2022-11-03T22:53:42Z

lib/logstash/outputs/tcp.rb

-                       :exception => e, :backtrace => e.backtrace)
-          client_socket.close rescue nil
-          client_socket = nil
+          log_warn "client socket failed:", e, host: @host, port: @port, socket: @client_socket&.peer


TCPSocket#to_s is useless object inspection, so we should send TCPSocket#peer, which gives us useful information about the socket we are connected to. This will need to be forward-ported to main/6.1 too.

yaauie · 2022-11-03T22:54:34Z

spec/outputs/tcp_spec.rb

@@ -73,4 +73,109 @@
      end
    end
  end
+
+  context 'client mode' do


all new specs will need to be forward-ported to main/6.1 series as well.

yaauie · 2022-11-04T02:30:18Z

CI is failing intermittently for the cross-ported fixes for server mode, and I am having a very difficult time figuring out why as everything is very consistently green in local docker tests (on my M1 Mac) with identical parameters. Every single IO we have control over is written in a way that makes it tolerant to partial-writes, and still we are occasionally seeing silent truncation on the 100MB test.

Since 3a6113d all of the failures have been in the Logstash 7 series, so I am beginning to think that I have found a bug in Jruby.

yaauie · 2022-11-04T05:57:45Z

🤦🏼 when I reduced the amount of memory available to my local docker, I was able to reproduce.

~~In the test code I was reading until EOF instead of until the IO was closed, and the memory pressure slowed down reads enough that I hit EOF before the writer was done writing.~~

EDIT: that wasn't it either. Because server-mode's Outputs::TCP#receive does not block, we need to wait to send close to the plugin until the payload is "probably" fully-written. We do this by blocking until the StringIO that is receiving the payload is approximately the right size.

because server-mode's TCP#receive does not block, we need to wait until the client thread has popped the payload from its queue and sent it over the socket before telling the TCP output to close, or else our close interrupts an in-flight event and our specs get a partial payload.

andsel

LGTM, I've left a comment on a doubt.

lib/logstash/outputs/tcp.rb

andsel · 2022-11-04T10:27:28Z

This is a follow up thought I had reviewing the code, but it's more related to the plugin implementation than this PR. I would really appreciate to listen to your thoughts.

I think we have a problem in the shutdown of client threads in server-mode.
Each thread execute a busy loop in Client#run, without any guard on exit condition, mainly doing

loop do
  begin
    payload = @queue.pop
    @socket.write(payload)
  rescue 
    break  
  end
end

The close method of the Client is:

def close
    @socket.close
end

The client threads could be blocked in 2 places:

queue.pop
socket.write
If it's blocked on the queue.pop, because queue is empty, then the close method doesn't have any effect. In this case the thread should be interrupted (I think to something like Java Thread.interrupt() ) to force it exit. Plus the busy loop should test for interrupted test to exit.
If it's blocked on socket.write then the socket.close generate an exception in the loop which make it exit.
Also, the plugin's stop method, shutdown the server socket, stopping the acceptance of new connections and the spawn of new client threads, but doesn't join the already created threads. This is implicitly done by the close of the socket, because being blocking call, we know that when it returns, the socket is closed, the Client received an error and exited, but I think we depend too much on the scheduler here. I think it could happen that plugin terminated, but one of the client threads has still to reach the socket.write that generates the exception and force it to break the loop.

So my point is that we need to improve the shutdown sequence, interrupting the thread and joining it.

yaauie · 2022-11-04T17:09:42Z

So my point is that we need to improve the shutdown sequence, interrupting the thread and joining it.

I agree. There are a lot of places in this plugin that need improvement. In server mode the queue for each client is also unbounded. Lots of room for improvement.

In the end, though, this PR makes the plugin significantly better than it was before and I would rather deliver these improvements immediately and chase down the further fixes separately.

It's not clean to me why we need to sleep 10ms if the payload contains data

You are correct; both IO#write and IO#syswrite calls should block until the the underlying IO is writable, and so the backoff sleep isn't strictly necessary. I'll back that out.

I'll then merge this PR, forward-port the relevant bits to the 6.1 series, and then file separate work to chase down these other issues. (including hopefully making 6.1 compatible with Logstash 7.x again so that we don't have to maintain two somewhat-significantly-diverged branches).

andsel · 2022-11-04T17:17:34Z

I'm ok to merge this PR as it's. A lot of work has been done!

My last comment was intended to open the discussion on what I've seen reviewing it, but that's not related to this specific PR.

yaauie changed the title ~~Backport tcp window support to 6 0 x~~ Backport TCP window, client closing fixes to 6.0.x Nov 3, 2022

yaauie added the status:needs-review label Nov 3, 2022

yaauie marked this pull request as draft November 3, 2022 17:59

roaksoax added status:work-in-progress and removed status:needs-review labels Nov 3, 2022

yaauie added 2 commits November 3, 2022 18:10

backport client-closing and logging standardization from 6.1.0

ed23c4e

backport overlarge-payload fixes from 6.1.1

c8bf2c3

yaauie force-pushed the backport-tcp-window-support-to-6-0-x branch from a4c726e to 879be71 Compare November 3, 2022 18:10

roaksoax assigned yaauie Nov 3, 2022

yaauie commented Nov 3, 2022

View reviewed changes

yaauie force-pushed the backport-tcp-window-support-to-6-0-x branch from f0b8b20 to ea5c5c9 Compare November 3, 2022 23:25

yaauie added 2 commits November 3, 2022 23:44

add trace logging for overlarge payloads

874beb8

fixup: add 100ms backoff for client-mode writes

29fb38e

yaauie force-pushed the backport-tcp-window-support-to-6-0-x branch 2 times, most recently from 8f6b61a to dd06530 Compare November 4, 2022 00:30

yaauie added 4 commits November 4, 2022 00:32

cross-port previous client-mode TCP window fixes to server-mode

72d5072

bump to 6.0.3

cae6234

fix server-mode shutdown bug, add baseline specs

6c5aa47

more changelog

3a6113d

yaauie force-pushed the backport-tcp-window-support-to-6-0-x branch from dd06530 to 3a6113d Compare November 4, 2022 00:33

yaauie marked this pull request as ready for review November 4, 2022 07:55

roaksoax added status:needs-review and removed status:work-in-progress labels Nov 4, 2022

andsel self-requested a review November 4, 2022 08:02

andsel approved these changes Nov 4, 2022

View reviewed changes

lib/logstash/outputs/tcp.rb Outdated Show resolved Hide resolved

lib/logstash/outputs/tcp.rb Outdated Show resolved Hide resolved

roaksoax added the status:approved label Nov 4, 2022

roaksoax removed the status:needs-review label Nov 4, 2022

remove 10ms backoff, since IO#write and IO#syswrite block until writable

672b631

yaauie merged commit 379d6ca into logstash-plugins:6.0.x Nov 4, 2022

yaauie deleted the backport-tcp-window-support-to-6-0-x branch November 4, 2022 17:23

yaauie mentioned this pull request Nov 4, 2022

Resolve 6.1 and 6.0 divergence #51

Open

6 tasks

roaksoax removed the status:approved label Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport TCP window, client closing fixes to 6.0.x #50

Backport TCP window, client closing fixes to 6.0.x #50

yaauie commented Nov 3, 2022 •

edited

Loading

yaauie left a comment

yaauie Nov 3, 2022

yaauie Nov 3, 2022

yaauie Nov 3, 2022

yaauie Nov 3, 2022

yaauie Nov 3, 2022

yaauie commented Nov 4, 2022

yaauie commented Nov 4, 2022 •

edited

Loading

andsel left a comment

andsel commented Nov 4, 2022

yaauie commented Nov 4, 2022

andsel commented Nov 4, 2022

Backport TCP window, client closing fixes to 6.0.x #50

Backport TCP window, client closing fixes to 6.0.x #50

Conversation

yaauie commented Nov 3, 2022 • edited Loading

yaauie left a comment

Choose a reason for hiding this comment

yaauie Nov 3, 2022

Choose a reason for hiding this comment

yaauie Nov 3, 2022

Choose a reason for hiding this comment

yaauie Nov 3, 2022

Choose a reason for hiding this comment

yaauie Nov 3, 2022

Choose a reason for hiding this comment

yaauie Nov 3, 2022

Choose a reason for hiding this comment

yaauie commented Nov 4, 2022

yaauie commented Nov 4, 2022 • edited Loading

andsel left a comment

Choose a reason for hiding this comment

andsel commented Nov 4, 2022

yaauie commented Nov 4, 2022

andsel commented Nov 4, 2022

yaauie commented Nov 3, 2022 •

edited

Loading

yaauie commented Nov 4, 2022 •

edited

Loading