Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix server stop/shutdown hang by stopping the FileWatcher when unloading configs. #227

Closed
wants to merge 1 commit into from

Conversation

LeadAssimilator
Copy link

@LeadAssimilator LeadAssimilator commented Dec 14, 2024

The FileWatcher needs to be stopped somewhere otherwise the jvm will hang on exit because it is keeping a ScheduledExecutorService thread pool alive with non-daemon threads. Since ConfigTracker.unloadConfigs is called during server stop, it seems like an ok place to also stop the FileWatcher.

Note that the FileWatcher's ScheduledExecutorService thread pool will on demand create threads when work is scheduled to the pool. If no config files the watcher is watching change, then no work will be scheduled and the pool will remain at 0 threads. However, once a config file changes, the watcher will schedule work to the pool and a non-daemon thread will be created. Once this occurs, the non-daemon thread from the pool will keep jvm alive and prevent it from exiting after trying to gracefully stop the server. The jvm must be killed from this point on, unless the FileWatcher is told to stop so that it can teardown the pool.

I think this unfortunately affects all versions, so it should probably be merged everywhere.

The FileWatcher needs to be stopped somewhere otherwise the jvm will hang on exit because it is keeping a ScheduledExecutorService thread pool alive with non-daemon threads. Since ConfigTracker.unloadConfigs is called during server stop, it seems like an ok place to also stop the FileWatcher.
@neoforged-pr-publishing
Copy link

  • Publish PR to GitHub Packages

@Technici4n
Copy link
Member

Does neoforged/NeoForge#1757 not fix the issue already?

@LeadAssimilator
Copy link
Author

Maybe. I'll have to update NF and test. I didn't see it land there or would even think to look there given FML is the one using FileWatcher and this seems more like an FML problem.

Even if it works, doesn't it make more sense for FML to teardown the things it creates/uses? Spreading out fixes across projects just feels wrong, unless there is a really good reason to do so. The other part of that change interrupting the pinger makes sense though since it is instantiated in the patch, but I can't find references to FileWatcher outside of FML so it arguably should be FMLs responsibility to cleanup.

I don't think there is any harm calling stop twice either if this PR is accepted.

@Matyrobbrt
Copy link
Member

This is not the correct place to put such hook. This PR would cause unloading one set of configs to immediately terminate the watcher and therefore affect the other configs (i.e. integrated server shutdown causing client config watches to stop).
I think the NeoForge PR suffices as config types can be loaded and unloaded at any time, but if you must, you will need to check that you only shut down the file watcher when the last config type is unloaded.

@Technici4n
Copy link
Member

I don't think there is any harm calling stop twice either if this PR is accepted.

Accepting this PR if it does nothing would make it difficult to understand why this call exists when we have too look at it again in the future. It would harm readability of the code, and therefore lead to net negative impact. 😉

@LeadAssimilator
Copy link
Author

This is not the correct place to put such hook. This PR would cause unloading one set of configs to immediately terminate the watcher and therefore affect the other configs (i.e. integrated server shutdown causing client config watches to stop). I think the NeoForge PR suffices as config types can be loaded and unloaded at any time, but if you must, you will need to check that you only shut down the file watcher when the last config type is unloaded.

Right, good catch, my bad. I didn't think about the behavior for non-dedicated servers. I'll rework.

@LeadAssimilator
Copy link
Author

I don't think there is any harm calling stop twice either if this PR is accepted.

Accepting this PR if it does nothing would make it difficult to understand why this call exists when we have too look at it again in the future. It would harm readability of the code, and therefore lead to net negative impact. 😉

It wouldn't need to be in both places and the NF call to stop should be removed if this one is eventually accepted after I rework it. I was simply stating it didn't hurt to call it twice as a means to mitigate risk and hint that this could be accepted before the other one is removed. But I'm also basically making your same argument, that the fix being in NF instead of FML is confusing and hurts readability.

@TelepathicGrunt
Copy link
Contributor

Pls test the latest neoforge as Sci’s fix should address the issue. Remember, this is a temporary fix until Nightconfig properly fixes the issue
TheElectronWill/night-config#187

Once properly fixed in Nightconfig, we will yeet Sci’s temporary fix and all will be well again.

As long as Sci’s fix is enough, I don’t see any need to mess with FML as well

@Technici4n
Copy link
Member

It wouldn't need to be in both places and the NF call to stop should be removed if this one is eventually accepted after I rework it.

Ah I see now

@LeadAssimilator
Copy link
Author

Confirmed the fix in NF works.

To address the original problem with this PR, the better approach would be to add a separate method to ConfigTracker that calls FileWatcher.stop rather than in unloadConfigs and call that from ServiceLifecycleHooks only in the dedicated server case. That would avoid the problem with non-dedicated servers losing their watches and wouldn't require any annoying synchronization if trying to otherwise track active watches. Doing so is actually what I intended, but I didn't realize the problems calling stop in unloadConfigs would cause in the non-dedicated case. Now doing it the problem-free way means it is effectively the same as the NF patch just in FML instead.

While I still think it is objectively better to have the fix in FML since NF isn't using FileWatcher and thus it would have much better spatial locality and just overall make much more sense, I'm going to abandon this PR because it just isn't worth it if everyone already knows about this hack in NF and is prepared to remove it once FileWatcher is fixed.

@LeadAssimilator LeadAssimilator deleted the patch-1 branch December 14, 2024 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants