Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: idsse-795: PublishConfirm retry connection #68

Merged
merged 7 commits into from
Jul 22, 2024

Conversation

mackenzie-grimes-noaa
Copy link
Contributor

@mackenzie-grimes-noaa mackenzie-grimes-noaa commented Jul 18, 2024

Linear Issue

IDSSE-795

Changes

  • If publish() throws exception, kill the PublishConfirm thread to start fresh (recreate connection and channel), then try publish a 2nd time

Explanation

This also included a huge chore of cleaning up tech debt in the test_publish_confirm class. I had previously written some convoluted MockPika class that maintained its state and invoked callback chains... I couldn't no longer make sense of it, and it was very hard to test or mock. Tests were passing individually but failing when run as a group.

In this PR, I rewrote the mock Connection and mock Channel using plain old pytest @fixtures to track the "state" of pika variables. Now it should be a bit easier to test PublishConfirm moving forward.

@mackenzie-grimes-noaa mackenzie-grimes-noaa marked this pull request as ready for review July 18, 2024 17:45
Copy link
Contributor

@Geary-Layne Geary-Layne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

except Exception as e: # pylint: disable=broad-exception-caught
logger.error('Publish message problem : (%s) %s', type(e), str(e))
return False
except Exception as exc: # pylint: disable=broad-exception-caught
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to specify an exception rather than disabling linter, I used AMQPConnectionError in PublisherSync. Still rather broad, maybe there is a better one.

# create new Thread, abandoning old one (it will shut itself down)
self._create_thread()
if not self._wait_for_channel_to_be_ready():
logger.error('Second attempt to connect to RabbitMQ failed. Cannnot publish')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a warning since it is possible to recover from.

routing_key,
json.dumps(message, ensure_ascii=True),
properties)
except Exception as retry_exc: # pylint: disable=broad-exception-caught
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If at all possible don't disable linter in operational code, I don't mind it in test (but we probably do it to much there too)


def _is_running(self) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to not make this a property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that I could for this one, since the underlying code (Thread.is_alive()) is type Callable[[None], bool] instead of simply bool.

Can the property tag do some magic to invoke a function and get the resulting True/False when you access the property?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure you could make this a property by just adding the annotation, and it would work. However, when I read this code the first time I thought _is_connected() and _is_running() are the same type of functions and thus should have same way of accessing. I still think the access should be the same but would now suggest not making _is_connected() a property, I'm not sure if a private(protected) property is very pythonic. But in the end, I don't think it really matters.

type(exc), str(exc))

# create new Thread, abandoning old one (it will shut itself down)
self._create_thread()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think with will cause a problem, but do we need to create a new thread? Isn't it possible that the connection has gone down but the thread is still good?

Copy link
Contributor Author

@mackenzie-grimes-noaa mackenzie-grimes-noaa Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible, but I couldn't figure out how to interrupt the running thread without it exiting/completing.

The way the thread task is written (in the _run() method), it creates a new connection, calls pika's ioloop.start() with it, and sleeps on a loop until something sets self._stopping to True. Once that changes to True, the thread returns.

So I'm not sure how we would replace the self._connection out from under the thread, unless we had some clever way for the thread to watch for its connection to change and start ioloop on the new connection instance if it does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a lot cleaner

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's funny, those were some robust function names.

@mackenzie-grimes-noaa mackenzie-grimes-noaa merged commit 84c03d6 into main Jul 22, 2024
2 checks passed
@mackenzie-grimes-noaa mackenzie-grimes-noaa deleted the bug/pub-conf-retry-connection branch July 22, 2024 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants