Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix YBInboundConnectionContext::HandleTimeout timer rescheduling #2964

Closed
ttyusupov opened this issue Nov 19, 2019 · 0 comments
Closed

Fix YBInboundConnectionContext::HandleTimeout timer rescheduling #2964

ttyusupov opened this issue Nov 19, 2019 · 0 comments
Assignees
Labels
kind/bug This issue is a bug

Comments

@ttyusupov
Copy link
Contributor

In the case of some specific network issues TcpStream::DoWrite is not trying to write anything to TCP and doesn’t update last_write_time_. So, YBInboundConnectionContext::HandleTimeout handler which is sending RPC heartbeats schedules its timer to be restarted again immediately because last_write_time_ quickly becomes too old and this leads to unlimited putting RPC heartbeats in TcpStream::sending_ queue.

@ttyusupov ttyusupov added the kind/bug This issue is a bug label Nov 19, 2019
@ttyusupov ttyusupov self-assigned this Nov 19, 2019
ttyusupov added a commit that referenced this issue Nov 20, 2019
Summary:
In the case of some specific network issues `TcpStream::DoWrite` is not trying to write anything to TCP and doesn’t update `last_write_time_`. So, `YBInboundConnectionContext::HandleTimeout` handler which is sending RPC heartbeats schedules its timer to be restarted again immediately because `last_write_time_` quickly becomes too old and this leads to unlimited putting RPC heartbeats in `TcpStream::sending_` queue.

- Fixed YBInboundConnectionContext::HandleTimeout.
- Added debug VLOGs.

Test Plan: Jenkins + manually with `iptables` on test cluster.

Reviewers: mikhail, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7590
ttyusupov added a commit that referenced this issue Nov 28, 2019
Summary:
In case of network errors RPC heartbeats could accumulate in `TcpStream::sending_` queue
until `TcpStream` is closed or healed. Added a restriction to not queue another heartbeat if
previous one is still in queue.

Test Plan: Jenkins

Reviewers: bogdan, raju, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7635
ttyusupov added a commit that referenced this issue Jan 17, 2020
Summary:
In the case of some specific network issues `TcpStream::DoWrite` is not trying to write anything to TCP and doesn’t update `last_write_time_`. So, `YBInboundConnectionContext::HandleTimeout` handler which is sending RPC heartbeats schedules its timer to be restarted again immediately because `last_write_time_` quickly becomes too old and this leads to unlimited putting RPC heartbeats in `TcpStream::sending_` queue.

- Fixed YBInboundConnectionContext::HandleTimeout.
- Added debug VLOGs.

Test Plan: Jenkins + manually with `iptables` on test cluster.

Reviewers: mikhail, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7590
ttyusupov added a commit that referenced this issue Jan 17, 2020
Summary:
In case of network errors RPC heartbeats could accumulate in `TcpStream::sending_` queue
until `TcpStream` is closed or healed. Added a restriction to not queue another heartbeat if
previous one is still in queue.

Test Plan: Jenkins

Reviewers: bogdan, raju, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7635
carlos-username pushed a commit to carlos-username/yugabyte-db that referenced this issue Mar 11, 2020
…scheduling

Summary:
In the case of some specific network issues `TcpStream::DoWrite` is not trying to write anything to TCP and doesn’t update `last_write_time_`. So, `YBInboundConnectionContext::HandleTimeout` handler which is sending RPC heartbeats schedules its timer to be restarted again immediately because `last_write_time_` quickly becomes too old and this leads to unlimited putting RPC heartbeats in `TcpStream::sending_` queue.

- Fixed YBInboundConnectionContext::HandleTimeout.
- Added debug VLOGs.

Test Plan: Jenkins + manually with `iptables` on test cluster.

Reviewers: mikhail, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7590
carlos-username pushed a commit to carlos-username/yugabyte-db that referenced this issue Mar 11, 2020
Summary:
In case of network errors RPC heartbeats could accumulate in `TcpStream::sending_` queue
until `TcpStream` is closed or healed. Added a restriction to not queue another heartbeat if
previous one is still in queue.

Test Plan: Jenkins

Reviewers: bogdan, raju, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7635
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This issue is a bug
Projects
None yet
Development

No branches or pull requests

1 participant