Fix mets server: clean leftover zombie processes #1284

MehmedGIT · 2024-10-02T15:43:43Z

This PR fixes:

No more zombie processes for Mets Servers when they terminate!
Appropriately deleting the Unix Domain Socket in more complicated configurations (e.g., UDS over TCP multiplexing)
More extensive logging (to separate files as well) for:
- ocrd_network.server_cache.locked_pages
- ocrd_network.server_cache.processing_requests
- ocrd.models.ocrd_mets.server.{self.url}
- ocrd_network.tcp_to_uds_mets_proxy

Please make sure you adapt your logging configuration file appropriately to avoid extreme amounts of logs. Maybe an adaption is needed to the default logging configuration file - open for discussions.

MehmedGIT · 2024-10-04T21:26:37Z

@joschrew, would be great if you could test your docker setup with this PR to see if there are any issues before this is merged.

kba

Excellent, this will make managing the PS much easier.

_exit was a terrible idea, obviously, my bad.

Besides some minor documentation issues, LGTM.

However, I am unsure about the additional logging. Not so much the new logs (which are appreciated) but the upgrading from log.debug to log.info. Since at least in my use case (kubernetes), everything, including all separate log files, is redirected to a single STDOUT stream, this will make it hard to spot relevant logging in all the noise. At least we should use different loggers for the very frequent loggings (i.e. have a self.log/ocrd_network.processing_server for regular stuff and self.verbose_log/ocrd_network.processing_server_verbose), so it is easier to configure in the ocrd_logging.conf and to filter by in the log viewer?

src/ocrd/mets_server.py

src/ocrd_network/runtime_data/deployer.py

kba

However, I am unsure about the additional logging. Not so much the new logs (which are appreciated) but the upgrading from log.debug to log.info. Since at least in my use case (kubernetes), everything, including all separate log files, is redirected to a single STDOUT stream, this will make it hard to spot relevant logging in all the noise. At least we should use different loggers for the very frequent loggings (i.e. have a self.log/ocrd_network.processing_server for regular stuff and self.verbose_log/ocrd_network.processing_server_verbose), so it is easier to configure in the ocrd_logging.conf and to filter by in the log viewer?

The logging is already distributed over different loggers, so that part is not an issue.

And the debug->info I can also live with, the intention is to not force users to set global log level to debug to see these. In any case, it can be adapted in the logging.conf if desired.

I'll be deploying this now and it is good to merge once @joschrew okays it.

joschrew

This is working for me. The reason it works are the changes in #1277. There the socket-file is deleted in Deployer.start_uds_mets_server when starting the mets_server for a workspace and the socket-file is existing.
This is necessary for me to work because when shutting down the docker containers the socket file is not deleted. It seems to me, that the shutdown Method of OcrdMetsServer is not called, I couldn't find any log entry, which should have been issued in this method. But that's ok for everything to work for me.

MehmedGIT · 2024-10-08T08:23:19Z

There the socket-file is deleted in Deployer.start_uds_mets_server when starting the mets_server for a workspace and the socket-file is existing. This is necessary for me to work because when shutting down the docker containers the socket file is not deleted. It seems to me, that the shutdown Method of OcrdMetsServer is not called, I couldn't find any log entry, which should have been issued in this method. But that's ok for everything to work for me.

Strangely, it seems the DELETE request is never sent to the mets server in the docker environment. The socket file is volume mapped on the host OS, and deleted with a shutdown method call by the Mets server process that resides inside a container. So the socket file would not be removed even if the shutdown method was invoked correctly. It seems that not much can be done for the docker environment other than the fallback solution you talked about.

EDIT: Works just fine after 34bfbf4 and ab660fb without the fallback solution.

Co-authored-by: Konstantin Baierer <kba@users.noreply.github.com>

…stopping

# Conflicts: # src/ocrd_network/runtime_data/deployer.py

deployer: remove METS Server path and url from their resp. caches on …

Rabbitmq heartbeat env

MehmedGIT force-pushed the fix_mets_server_zombies branch 2 times, most recently from e5ac317 to 569edff Compare October 4, 2024 11:34

previous state

7b6552b

MehmedGIT force-pushed the fix_mets_server_zombies branch from 569edff to 7b6552b Compare October 4, 2024 11:36

MehmedGIT added 17 commits October 4, 2024 13:37

Merge branch 'network_client_block_prints' into fix_mets_server_zombies

e3f5949

do not use pid killing

637a40e

add logger param to stop mets server

387dc30

add extensive logging to mets proxy

07953f7

return empty response type earlier

3a9e147

fix: change UDS file deletion place

00655b8

return response from mets server before dying

810f811

fix: remove UDS file correctly

4970e62

comment out irrelevant code

906766d

fix: no more zombies, yay!

a87a2e1

add: extensive logging of mets server to file

e0ff4eb

change cache debug -> info for extensive logging to file

53c8f3f

set log from info to debug

fe41223

fix: typo

55c2f63

improve: delete socket file more appropriately

bf6616f

remove: unnecessary code

bc8a03b

fix: .__dict__ of {}

303488a

MehmedGIT marked this pull request as ready for review October 4, 2024 21:25

MehmedGIT requested review from kba and joschrew October 4, 2024 21:25

kba requested changes Oct 7, 2024

View reviewed changes

kba approved these changes Oct 7, 2024

View reviewed changes

joschrew approved these changes Oct 8, 2024

View reviewed changes

Update src/ocrd/mets_server.py

c8e0c73

Co-authored-by: Konstantin Baierer <kba@users.noreply.github.com>

MehmedGIT and others added 14 commits October 8, 2024 10:24

Update src/ocrd/mets_server.py

2cd4a64

Co-authored-by: Konstantin Baierer <kba@users.noreply.github.com>

Update src/ocrd/mets_server.py

44a8ceb

Co-authored-by: Konstantin Baierer <kba@users.noreply.github.com>

Update src/ocrd_network/runtime_data/deployer.py

61c683f

Co-authored-by: Konstantin Baierer <kba@users.noreply.github.com>

remove unnecessary method

5055309

fix: make stop() and ..reload..() sync

34bfbf4

fix: stop mets server when no cached requests

ab660fb

clean: remove pid kill flag in stop mets server

148f8d4

extend log: server cache requests

dacd325

improve: sleep no longer needed

05ded73

add new env: OCRD_NETWORK_RABBITMQ_HEARTBEAT

5d755a8

fix: empty -> text

c5c60fd

deployer: remove METS Server path and url from their resp. caches on …

e1b9784

…stopping

Merge branch 'fix_mets_server_zombies' into deployer-mets-caching

47c9acf

# Conflicts: # src/ocrd_network/runtime_data/deployer.py

Merge pull request #1287 from OCR-D/deployer-mets-caching

926cb97

deployer: remove METS Server path and url from their resp. caches on …

Base automatically changed from network_client_block_prints to master October 10, 2024 10:37

kba added 2 commits October 10, 2024 12:41

Simplify description for OCRD_NETWORK_RABBITMQ_HEARTBEAT

7f60559

Merge pull request #1285 from OCR-D/rabbitmq_heartbeat_env

3e736a7

Rabbitmq heartbeat env

kba merged commit 9391f49 into master Oct 10, 2024
22 checks passed

kba deleted the fix_mets_server_zombies branch October 10, 2024 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mets server: clean leftover zombie processes #1284

Fix mets server: clean leftover zombie processes #1284

MehmedGIT commented Oct 2, 2024 •

edited

Loading

MehmedGIT commented Oct 4, 2024

kba left a comment

kba left a comment

joschrew left a comment

MehmedGIT commented Oct 8, 2024 •

edited

Loading

Fix mets server: clean leftover zombie processes #1284

Fix mets server: clean leftover zombie processes #1284

Conversation

MehmedGIT commented Oct 2, 2024 • edited Loading

MehmedGIT commented Oct 4, 2024

kba left a comment

Choose a reason for hiding this comment

kba left a comment

Choose a reason for hiding this comment

joschrew left a comment

Choose a reason for hiding this comment

MehmedGIT commented Oct 8, 2024 • edited Loading

MehmedGIT commented Oct 2, 2024 •

edited

Loading

MehmedGIT commented Oct 8, 2024 •

edited

Loading