Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mets server: clean leftover zombie processes #1284

Merged
merged 35 commits into from
Oct 10, 2024
Merged

Conversation

MehmedGIT
Copy link
Contributor

@MehmedGIT MehmedGIT commented Oct 2, 2024

This PR fixes:

  • No more zombie processes for Mets Servers when they terminate!
  • Appropriately deleting the Unix Domain Socket in more complicated configurations (e.g., UDS over TCP multiplexing)
  • More extensive logging (to separate files as well) for:
    • ocrd_network.server_cache.locked_pages
    • ocrd_network.server_cache.processing_requests
    • ocrd.models.ocrd_mets.server.{self.url}
    • ocrd_network.tcp_to_uds_mets_proxy

Please make sure you adapt your logging configuration file appropriately to avoid extreme amounts of logs. Maybe an adaption is needed to the default logging configuration file - open for discussions.

@MehmedGIT MehmedGIT force-pushed the fix_mets_server_zombies branch 2 times, most recently from e5ac317 to 569edff Compare October 4, 2024 11:34
@MehmedGIT MehmedGIT force-pushed the fix_mets_server_zombies branch from 569edff to 7b6552b Compare October 4, 2024 11:36
@MehmedGIT MehmedGIT marked this pull request as ready for review October 4, 2024 21:25
@MehmedGIT MehmedGIT requested review from kba and joschrew October 4, 2024 21:25
@MehmedGIT
Copy link
Contributor Author

@joschrew, would be great if you could test your docker setup with this PR to see if there are any issues before this is merged.

Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, this will make managing the PS much easier.

_exit was a terrible idea, obviously, my bad.

Besides some minor documentation issues, LGTM.

However, I am unsure about the additional logging. Not so much the new logs (which are appreciated) but the upgrading from log.debug to log.info. Since at least in my use case (kubernetes), everything, including all separate log files, is redirected to a single STDOUT stream, this will make it hard to spot relevant logging in all the noise. At least we should use different loggers for the very frequent loggings (i.e. have a self.log/ocrd_network.processing_server for regular stuff and self.verbose_log/ocrd_network.processing_server_verbose), so it is easier to configure in the ocrd_logging.conf and to filter by in the log viewer?

src/ocrd/mets_server.py Outdated Show resolved Hide resolved
src/ocrd/mets_server.py Outdated Show resolved Hide resolved
src/ocrd/mets_server.py Outdated Show resolved Hide resolved
src/ocrd/mets_server.py Outdated Show resolved Hide resolved
src/ocrd_network/runtime_data/deployer.py Outdated Show resolved Hide resolved
Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I am unsure about the additional logging. Not so much the new logs (which are appreciated) but the upgrading from log.debug to log.info. Since at least in my use case (kubernetes), everything, including all separate log files, is redirected to a single STDOUT stream, this will make it hard to spot relevant logging in all the noise. At least we should use different loggers for the very frequent loggings (i.e. have a self.log/ocrd_network.processing_server for regular stuff and self.verbose_log/ocrd_network.processing_server_verbose), so it is easier to configure in the ocrd_logging.conf and to filter by in the log viewer?

The logging is already distributed over different loggers, so that part is not an issue.

And the debug->info I can also live with, the intention is to not force users to set global log level to debug to see these. In any case, it can be adapted in the logging.conf if desired.

I'll be deploying this now and it is good to merge once @joschrew okays it.

Copy link
Contributor

@joschrew joschrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working for me. The reason it works are the changes in #1277. There the socket-file is deleted in Deployer.start_uds_mets_server when starting the mets_server for a workspace and the socket-file is existing.
This is necessary for me to work because when shutting down the docker containers the socket file is not deleted. It seems to me, that the shutdown Method of OcrdMetsServer is not called, I couldn't find any log entry, which should have been issued in this method. But that's ok for everything to work for me.

@MehmedGIT
Copy link
Contributor Author

MehmedGIT commented Oct 8, 2024

There the socket-file is deleted in Deployer.start_uds_mets_server when starting the mets_server for a workspace and the socket-file is existing. This is necessary for me to work because when shutting down the docker containers the socket file is not deleted. It seems to me, that the shutdown Method of OcrdMetsServer is not called, I couldn't find any log entry, which should have been issued in this method. But that's ok for everything to work for me.

Strangely, it seems the DELETE request is never sent to the mets server in the docker environment. The socket file is volume mapped on the host OS, and deleted with a shutdown method call by the Mets server process that resides inside a container. So the socket file would not be removed even if the shutdown method was invoked correctly. It seems that not much can be done for the docker environment other than the fallback solution you talked about.

EDIT: Works just fine after 34bfbf4 and ab660fb without the fallback solution.

Co-authored-by: Konstantin Baierer <kba@users.noreply.github.com>
Base automatically changed from network_client_block_prints to master October 10, 2024 10:37
@kba kba merged commit 9391f49 into master Oct 10, 2024
22 checks passed
@kba kba deleted the fix_mets_server_zombies branch October 10, 2024 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants