Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcp/socks_connecters do not handle ZMQ_RECONNECT_IVL of -1 #3158

Closed
ehilscher opened this issue Jun 6, 2018 · 5 comments
Closed

tcp/socks_connecters do not handle ZMQ_RECONNECT_IVL of -1 #3158

ehilscher opened this issue Jun 6, 2018 · 5 comments

Comments

@ehilscher
Copy link
Contributor

Issue description

The ZMQ socket option ZMQ_RECONNECT_IVL supports a value of -1, which is intended to turn off reconnection attempts. When an initial connection attempt fails, tcp_connecter and socks_connecter use options.reconnect_ivl to set a timer to retry the initial connection. But they do not check for -1, and so the timer value is invalid (observed to be a point in the past). It could be argued that a timer should not be set at all when options.reconnect_ivl is -1.

Possible solution:
zmq::tcp_connecter_t::add_reconnect_timer and zmq::socks_connecter_t::add_reconnect_timer are modified:
if (options.reconnect_ivl != -1) {
const int interval = get_new_reconnect_ivl();
add_timer(interval, reconnect_timer_id);
socket->event_connect_retried(endpoint, interval);
reconnect_timer_started = true;
}

An alternative solution is to add an additional socket option (such as ZMQ_CONNECTION_RETRY_IVL) that is only used by the connecters, and could also be -1. This would allow users to let ZMQ retry the initial connection, but not a dropped connection, if desired.

Environment

  • libzmq version: 4.2.3
  • OS: Windows 10

Steps to reproduce the issue

  1. Place a breakpoint in zmq::tcp_connecter_t::add_reconnect_timer.
  2. Create a socket.
  3. Set the ZMQ_RECONNECT_IVL to -1
  4. Set the ZMQ_RECONNECT_IVL_MAX to 0 (should be the default)
  5. Call connect on the socket with a TCP address (ensure that there is no listening server at that address).
  6. Step through zmq::tcp_connecter_t::add_reconnect_timer:
    Psuedocode:
    int interval = -1 + generate_random () % -1;
    uint64_t expiration = clock.now_ms () + interval; // With MSVC this always results in some time in the past

What's the actual result?

A timer is set for some time in the past to retry an initial connection.

What's the expected result?

A timer is not set to retry an initial connection.

@bluca
Copy link
Member

bluca commented Jun 7, 2018

I can't quite reproduce this. With wireshark it's very easy to see - a connection to a non-existing peer is attempted only once with IVL at -1, but many many times without it.
What is the problem exactly?

@ehilscher
Copy link
Contributor Author

I am seeing the opposite effect. Here is my entire test program:

void* context = zmq_ctx_new();
void* socket = zmq_socket(context, ZMQ_REQ);

int ivl = -1;
zmq_setsockopt(socket, ZMQ_RECONNECT_IVL, &ivl, sizeof(ivl));

zmq_connect(socket, "tcp://127.0.0.1:59999");

while(true)
{
	std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}

I've attached a screenshot of the packet capture from letting this program run for about 30 seconds (filtered by port 59999).

rawcap

@bluca
Copy link
Member

bluca commented Jun 7, 2018

Very weird. Maybe something to do with Windows.
With the fix you suggested, does it work? If so please send a PR

@ehilscher
Copy link
Contributor Author

Created PR 3163.

My theory is that even on other platforms this problem occurs, but it's partially dependent on the generated random number, and how the % operator handles -1. So even on your system I would bet a reconnect timer is getting set, but it may not trigger for a day, or a month, or even 100 years.

@bluca bluca closed this as completed Jun 12, 2018
@bluca
Copy link
Member

bluca commented Jun 12, 2018

Fixed by #3163

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants