-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
heartbeat and long running workers #301
Comments
Yes, due to the single thread limitation this is quite hard to fix with PHP. The latest release of php-amqplib has some fixes in the heartbeats area BTW. If you have any ideas how we could handle heartbeats on php-amqplib I would like to hear them. |
@videlalvaro no idea at all...I was trying to reproduce the same problem with elixir, and obviously the base erlang library handles hearbeats in a separate process. In php there are some libraries that try to address the probelm, such as https://github.com/kriswallsmith/spork We solved the issue by raising the heartbeat to a reasonable value that should work for all our workers. Suboptimal, but it's working. If someone has another idea, maybe we could join forces to work on it togheter! |
@matteosister here is a proposal https://github.com/videlalvaro/RabbitMqBundle/issues/325 Also we can start a timer at the beginning of message processing and after processing check if processing time more than socket timeout time then do reconnect. For Doctrine's DBAL Connection we made a wrapper and for each method added try-catch logic where we catch timeout exceptions and do query re-run after that. I think that would work for amqp as well |
@matteosister how did you solve the issue? I think I tried all kind of combination for heartbeat value but without any luck. The message remains unacked if the consumer takes more time to process (about 100s). |
@vcraescu , You can solve this issue as following:
However, it's better to track the state of calls using database also. so you can recycle unexpected terminated execution without requeing the message. (lost messages) |
@Snake-Tn I can't repush the same message to the queue. That's the original problem. After ~100s i can't do any kind of communication with rabbitmq cause it will throw "Broken pipe or connection closed". If I move to a database solution then why I keep using a rabbitmq queue at all? |
@vcraescu |
@Snake-Tn But I'm not even able to push the message via a producer because of the very same reason. |
@vcraescu , I checked with one of this repo maintainers, it's possible to define connection per producer/consumer.(per default they share the same default connection) however, the producer connection is going to be established on service creation. so it's going to be timed out also. (they are planing to make it lazy for future release) |
@Snake-Tn And how I'm supposed to use a lower function calls from consumer? |
@vcraescu had the same problems in production, our consumer, processes rows from a CSV and publishes the results to another queue at the end, we got around the issue using a few things:
These steps have sorted the issue for us, and we're now handling millions of publishes from consumers using this process, some of the consumers are busy for hours without connection failures, there is the odd one that dies, but that's then picked up by the retry mechanism. |
@andrefigueira Did you keep using the bundle or implemented everything using direct calls to rabbitmq? Thank you for your detailed answer! |
Would it maybe be possible to pass a callback to the |
Same as @vcraescu, I used all combinations of heartbeat, read_write_timeout and lazy in the bundle but always received "unable to write to socket [104]: Connection reset by peer", when the consumer take more than 5s to do the process. |
Getting the same issue. Here is my workaround. |
@srgkas could you shed some light how you send the "ping", because unfortunately the solution based on https://blog.mollie.com/keeping-rabbitmq-connections-alive-in-php-b11cb657d5fb to call
|
I also have had a lot of problems with this. Since the solution from Mollie broke in 2.9.2 I gave up; now using timeouts that are so long our workers have finished before. Not what I want, but it's the only thing that works ok. |
Make sure to look at the logic of that method, because it will throw errors if the heartbeat time is not configured correctly on both the client and the server. |
@Perni1984 Hi. |
@srgkas Can you post the code you use to replace the method |
Any progress on this issue? |
@arturslogins don't hold your breath! :) |
@vcraescu |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Latest library version has improved heartbeat logic. Also there is unix signals based heartbeat checker, which might solve Your problems. Please report there if still issue still occurs. |
Can you explain the Unix signal heartbeat logic? Don't see anything in the docs about sending a heartbeat via Unix signal, and couldn't find it in the code either. (Edit) nevermind, found it in the php-amqplib repo. https://github.com/php-amqplib/php-amqplib#unix-signals |
it doesn't work if i use long running blocking operations (database operation, http request, etc.)
I get an error: |
agree. It doesn't . This only works if you have set of blocking operation and everyone of them is inside 'heartbeat' timeframe |
We are trying to solve a problem we have in production. I'm here just to ask if someone has the same problem, or just confirm that this could be the case.
We have an heartbeat of 10s in the bundle configuration, and we have workers that takes long time to complete. The main reason is that they call a really slow external service that could take longer then 10 seconds. Possibly even 50 or 100 seconds.
At the end of the worker we need to publish another message to rabbitmq, and we have always error messages like:
The method is write, which (I suppose) gets called by heartbeat.
Now the question: is it possible that this library, while handling the worker's load, is busy and do not send any heartbeat? This could be the case because of the single thread php limitation, so it makes sense. But in this particular case, is it possible that the server closes the connection because of the lack of heartbeats, and when we try to publish again the server is gone?
Any help is really appreciated!
Thanks.
The text was updated successfully, but these errors were encountered: