Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autopaho set DefaultPinger on ClientConfig can't reconnect to server after disconnection #227

Closed
minyukim opened this issue Jan 13, 2024 · 6 comments · Fixed by #229
Closed

Comments

@minyukim
Copy link
Contributor

Describe the bug
Autopaho set DefaultPinger on ClientConfig can't reconnect to the server after disconnection
Found this bug while using DefalutPinger with logger to test Pinger with autopaho (I guess this is only usecase for now)

To reproduce

  1. Set PingHandler to DefaultPinger on ClientConfig and autopaho.NewConnection
pinger := paho.NewDefaultPinger()
cfg := autopaho.ClientConfig{
    ...
    ClientConfig: paho.ClientConfig{
        ....
        PingHandler: pinger,
        ....
    }
    ...
}
cm, err := autopaho.NewConnection(ctx, cfg)
  1. Disconnect client network and wait for client being closed
  2. Reconnect network -> But autopaho can't reconnect to the server

Debug output

2024/01/13 10:00:54 connecting
2024/01/13 10:00:54 sending CONNECT
2024/01/13 10:00:54 waiting for CONNACK/AUTH
2024/01/13 10:00:55 received CONNACK
mqtt connection up
client error: ping handler error: Run() already called
2024/01/13 10:00:55 received CONNACK, starting PingHandler
2024/01/13 10:00:55 starting publish packets loop
2024/01/13 10:00:55 starting incoming
2024/01/13 10:00:55 subscribing to …
2024/01/13 10:00:55 PacketSent() returning due to done channel
2024/01/13 10:00:55 waiting for SUBACK
2024/01/13 10:00:55 returning from ping handler worker
2024/01/13 10:00:55 error called: ping handler error: Run() already called
2024/01/13 10:00:55 client stopped
2024/01/13 10:00:55 stop() called with error: <nil>
2024/01/13 10:00:55 ping stopped
2024/01/13 10:00:55 conn closed
2024/01/13 10:00:55 acks tracker reset
2024/01/13 10:00:55 session updated, waiting on workers
2024/01/13 10:00:55 client stopping, incoming stopping
2024/01/13 10:00:55 returning from incoming worker
2024/01/13 10:00:55 queue got connection
2024/01/13 10:00:55 returning from publish packets loop worker
2024/01/13 10:00:55 workers done
2024/01/13 10:00:55 handleError received error: ping handler error: Run() already called
2024/01/13 10:00:55 handleError passed error on: ping handler error: Run() already called

Expected behaviour
autopaho should be able to reconnect to the server even previous client was closed.

Software used:

Additional context
It happens because the values of Pinger in paho.ClientConfig is changed and autopaho uses this changed values to reconnect to the server

func (p *DefaultPinger) Run(conn net.Conn, keepAlive uint16) error {
	if keepAlive == 0 {
		p.debug.Println("Run() returning immediately due to keepAlive == 0")
		return nil
	}
	if conn == nil {
		return fmt.Errorf("conn is nil")
	}
	p.mu.Lock()
	if p.timer != nil {
		p.mu.Unlock()
		return fmt.Errorf("Run() already called")
	}
	select {
	case <-p.done:
		p.mu.Unlock()
		return fmt.Errorf("Run() called after stop()")
	default:
	}
	p.keepAlive = keepAlive
	p.conn = conn
	p.timer = time.AfterFunc(0, p.sendPingreq) // Immediately send first pingreq
	p.mu.Unlock()

	return <-p.errChan
}

timer is set while Run()
so the timer of the Pinger in paho.ClientConfig is not nil after that, and it causes return fmt.Errorf("Run() already called") while trying reconnection
It's not only problem of timer, but done(+ other channels I guess?) and stopOnce should also be initialized in order to work correctly

This is not a critical bug because users don't need logger for production, but it would be good to fix I guess.

@minyukim
Copy link
Contributor Author

minyukim commented Jan 13, 2024

I thought something like reset() at the end of Run() so that the value of Pinger can be reset after the Pinger stops like this,

func (p *DefaultPinger) Run(conn net.Conn, keepAlive uint16) error {
...
	err := <-p.errChan
	p.reset()
	return err
}

But stopped Pinger with error causes client stops,

if err := c.config.PingHandler.Run(c.config.Conn, keepalive); err != nil {
	go c.error(fmt.Errorf("ping handler error: %w", err))
}

And it causes Pinger stops again, so some values become dirty again

func (c *Client) close() {
	...
	c.config.PingHandler.Stop()
	...
}

@MattBrittan
Copy link
Contributor

This is not a critical bug because users don't need logger for production, but it would be good to fix I guess.

Thanks for reporting this so quickly; great bug report, but I'm not sure that I agree with the above because I believe the error will prevent the autopaho from reconnecting (ping handler returning error leads to call to c.error which should drop connection). I'll take a look later today (GMT+13) if @vishnureddy17 does not do so before that (adding Reset seems sensible).

@minyukim
Copy link
Contributor Author

minyukim commented Jan 14, 2024

Opened a PR with reset() approach. That's the simplest way I can think of.
Please let me know your opinions about it.

@MattBrittan
Copy link
Contributor

Thanks for the PR. I'm going to have a quick look at an alternative approach; being able to reuse the pinger and having a stop() function complicates things a bit given that the pinger needs to be reusable. I'm just about to have a scan through the code (wondered if it might be possible to simplify things a bit).

@MattBrittan
Copy link
Contributor

the error will prevent the autopaho from reconnecting (ping handler returning error leads to call to c.error which should drop connection)

Clarifying this; the issue will prevent autopaho from reconnecting IF a pinger is set in ClientConfig (otherwise the pinger will be created afresh each time so everything works). This means we have a couple of options; allow Run() to be called multiple times OR reset the pinger prior to each reconnection. I believe the first approach is simplest (otherwise autopaho will need a different pinger interface).

I started looking at using Context to handle this, and it was one of those changes that just seemed to expand. PR (#229) addresses this issue but also makes a range of other fairly major changes (I believe it makes it easier to follow the termination process).

@vishnureddy17 apologies; my PR significantly changes the implementation of Pinger you recently submitted. I'm not committed to making this change but I do feel it's considerably simpler (assuming I have not missed anything). Would really appreciate your thoughts.

The PR also makes it clear that Client is a single use object. There was no way to reuse Client after the config was made private (previously you could, potentially, change the connection in ClientConfig and call Connect again, but I'm unsure if that would have actually worked). This change makes it explicit that Connect can only be called once. Happy to change this if anyone disagrees!

@minyukim
Copy link
Contributor Author

@MattBrittan Just reviewed your PR, and like that approach.
I'm going to close my PR right after yours is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants