Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential deadlock in managePublishQueue / Consider how paho.Publish should handle errors #196

Open
MattBrittan opened this issue Nov 13, 2023 · 1 comment

Comments

@MattBrittan
Copy link
Contributor

Describe the bug

In autopaho managePublishQueue we call PublishWithOptions and if an error is returned wait on the connection to drop before trying again (so we assume that if PublishWithOptions returns an error then it's fatal). This is not a valid assumption because publish may return at other times; for instance if c.PacketTimeout is exceeded.

Solving this is going to require a bit of thought and modifications to paho; consider publishQoS12:

func (c *Client) publishQoS12(ctx context.Context, pb *packets.Publish, o PublishOptions) (*PublishResponse, error) {
	c.debug.Println("sending QoS12 message")
	pubCtx, cf := context.WithTimeout(ctx, c.PacketTimeout)
	defer cf()

	ret := make(chan packets.ControlPacket, 1)
	if err := c.Session.AddToSession(pubCtx, pb, ret); err != nil {
		return nil, err
	}

So if we are attempting to send a lot of messages and c.Session.AddToSession blocks for, by default, 10 seconds then publishQoS12 returns an error without actually making an attempt to send the message (due to c.PacketTimeout). This makes it difficult for the caller to know what to do when it receives an error (should it retry or should it expect the connection to drop due to a protocol error).

I think there is a quick fix and a longer term project here:

  • Quick fix - Autopaho should retry publishing from the queue if PublishWithOptions returns an error (checking the connection status before doing so)
  • Longer term - Modify Publish such that it's clear if the error is fatal (connection will drop) or can be retried. It would also be good to clarify the meaning of c.PacketTimeout (personally I don't like the way it's used in Publish; the function already takes a Context so adding an additional timeout is a bit confusing).

Software used:

@master

@MattBrittan
Copy link
Contributor Author

The fix I have just merged should prevent this from being an issue with autopaho. However it would be good to review this code further and see if there is a better way of handling things (this might include looking at what to do if the client gets stalled due
to receive maximum depletion - should we drop the connection if this happens?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant