Skip to content
This repository has been archived by the owner on Jun 29, 2023. It is now read-only.

Added close on the channel after a port failure #276

Merged
merged 1 commit into from
Jan 21, 2022

Conversation

mmichailidis
Copy link
Contributor

During a kubernetes deployment graylog may change location if the original pod closes and a new one opens. The DNS assigned to it will be updated appropriately but the gelf will never try to reconnect.

After a lot of debugging using the latest logstash-gelf and

openjdk 11.0.9.1 2020-11-04
OpenJDK Runtime Environment (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04)
OpenJDK 64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)

It was identified that in the UDPSender.java and lines 54-58 there was a strange issue. The first isConnected() was correctly reporting false but the inner isConnected() under the connect() function was incorrectly reporting true. The strange part was that the line that was forcing isConnected() to return false was the myChannel.read(byteBuffer) which was throwing an exception.

In order to resolve this, the channel will be closed as soon as the port is unreachable (as there is no need to keep it open with an unreachable port ).

In order to test this ( as I couldn't simple test it with code. ) we did test it outside kubernetes for easier control.
2 graylog installations were deployed on 2 separate machines ( A , B ) and a new hosts entry was inserted into the machine running the java app with the gelf. the hosts entry was pointing to one of the machines (in which graylog was running) and in the other graylog was off. After the first 3-4 logs produced and correctly published to graylog the hosts entry was changed to point to the B machine and closed the graylog of A / opened graylog of B. As a result the messages were never redirected to the new machine without this change.

@mp911de mp911de added the type: bug A general bug label Nov 22, 2021
@mp911de
Copy link
Owner

mp911de commented Nov 22, 2021

Thanks for submitting a pull request. Would it generally make sense to close the socket if an exception has happened?

@mmichailidis
Copy link
Contributor Author

Was actually thinking about it but the IOException is quite generic and it could be something random which would not require re-opening the channel. But at the same time close() is a no-op if the channel is already closed resulting in 0 overhead from this extra call.
As a result thinking about it now with clear mind it makes sense to close the channel on exception. Should I update the PR ?

@mp911de
Copy link
Owner

mp911de commented Nov 22, 2021

Thanks for your opinion. I primarily wanted to get another perspective, I'm happy to update the PR on my side.

I/O exceptions can happen due to obvious network reasons (connection reset by peer, TCP retransmit timeout) but also some server-side lagging (read timeout). I assume that read timeouts are a fine indicator for closing the connection. Network partitions or forced disconnects are as well a good justification to close an open socket early on.

@mmichailidis
Copy link
Contributor Author

Yes, you are right. Its better to early close and retry the connection instead of hanging to a problematic one.

@mp911de mp911de added this to the 1.15.0 milestone Jan 21, 2022
@mp911de mp911de merged commit fa6b152 into mp911de:main Jan 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: bug A general bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants