Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues introduced by PR #2113 (SDL2) #2169

Closed
misl6 opened this issue May 3, 2020 · 4 comments · Fixed by #2180
Closed

Issues introduced by PR #2113 (SDL2) #2169

misl6 opened this issue May 3, 2020 · 4 comments · Fixed by #2180
Labels

Comments

@misl6
Copy link
Member

misl6 commented May 3, 2020

As said on Discord #dev channel yesterday, PR #2113 introduces a lot of blocking issues.

These are the results of the tests done by me, @AndreMiras and @opacam :

  • sdl2==2.0.10 have issues that have been solved by the SDL2 team, so it needs to be bumped to 2.0.12.

  • sdl2==2.0.12 works but create freezes during runtime.

  • These freezes are definitely related to the new SDL_LockMutex / SDL_UnlockMutex mechanism they added for concurrency issues.

  • Commenting SDL_LockMutex on Touch related events fixes the freeze issue for non-fullscreen apps.

  • On fullscreen apps, the patch it's also needed on Resize, .. etc events.

I'm providing an attached patch that fixes the issues on top of 2.0.12, btw seems not a good idea to do that, so it needs some more investigation:

disable_mutex.txt

@AndreMiras
Copy link
Member

Thanks for the detailed report ♥️
This is also related to #2167

@AndreMiras AndreMiras added the bug label May 3, 2020
@inclement
Copy link
Member

I would have no objection to applying this patch on top of 2.0.12 as a stopgap, if it seems to work. It can't be worse than just not working at all!

@ghost
Copy link

ghost commented May 7, 2020

So, I looked into the SDL2 code, since I was curious to find out what is possibly going on with these freezes.

My belief is that these mutexes are by design very likely to break even with correct SDL2 usage by kivy, and here is why: I looked at a particular function with these new mutexes, and just looked at what it possibly does:

 JNIEXPORT void JNICALL SDL_JAVA_INTERFACE(onNativeResize)(
                                     JNIEnv *env, jclass jcls)

If you look at that function, it will do the following:

  1. it locks the mutex: SDL_LockMutex(Android_ActivityMutex);

  2. it calls other things, and I just followed a made-up call chain by following a call in each function:

    src/core/android.c: onNativeResize calls Android_SendResize()
    src/video/android/SDL_androidvideo.c: Android_SendResize calls SDL_SendWindowEvent()
    src/events/SDL_windowevents.c: SDL_SendWindowEvent calls SDL_OnWindowFocusLost()
    src/video/SDL_video.c: SDL_OnWindowFocusLost calls SDL_MinimizeWindow()
    src/video/SDL_video.c: SDL_MinimizeWindow calls SDL_UpdateFullscreenMode()
    src/video/SDL_Video.c: SDL_UpdateFullscreenMode calls _this->SetWindowFullscreen() = Android_SetWindowFullscreen
    src/video/android/SDL_androidvideo.c: Android_SetWindowFullscreen runs
    

    Now what's the first line of Android_SetWindowFullscreen? It's this one: SDL_LockMutex(Android_ActivityMutex); (remember we're still coming from a function that already locked it.)

Now this particular call chain might not be directly possible, since it would go from a resize to a minimize. But my point is, a similar call chain with a result like this very likely exists. Update: I just realized SDL_LockMutex() is recursive, so this call chain would need to somehow jump threads. However, maybe Android's Java API does make a thread invoke another internally for some operations? I can't imagine anything else causing this... nevertheless, the call chain above isn't possibly the cause but there likely is one that goes through a JNI call that looks a bit alike

These functions SDL2 guards code which not only post user events (which is safe) but which also may cause inherently complex operations in SDL2 as a result to things, e.g. minimize the window on focus loss which in itself accesses the activity again. It seems almost impossible to me that SDL2 really eliminates all such "back flows" with the current guard design, therefore inevitably causing such freezes in corner cases.

Now what to do instead is tricky: maybe SDL2 could just push the event to an internal extra queue, immediately free the mutex again, then process these from the C main thread somehow with the mutex initially not locked (e.g. when SDL_PollEvent() is called, before the "normal" user-facing queue is processed and returned) such that these things still get processed before the user gets their resulting events, but not nested while the mutex is still held. I'm sure the SDL2 devs might have a better idea.

But this is just a guess from my short look at the code. Likely there are better ways to fix this than what I suggested. Maybe there don't exist such call paths at all, but honestly from the complexity of the triggered code and what it can internally call I find it very likely that such chains exist, and that this is the culprit. And it seems unlikely they can be fully avoided unless the guard design is changed to not guard the complex "reaction" code rather than just event queue posting.

In any case, I recommend making an SDL2 bugtracker ticket and pointing this likely problem out. (You are free to link this comment of mine if you want.)

AndreMiras added a commit to AndreMiras/python-for-android that referenced this issue May 8, 2020
Actually simply reverted few files:
```sh
git checkout master -- \
    pythonforandroid/recipes/sdl2/__init__.py \
    pythonforandroid/bootstraps/sdl2/build/src/patches/SDLActivity.java.patch \
    pythonforandroid/bootstraps/sdl2/build/src/main/java/org/kivy/android/PythonActivity.java
```
- fixes kivy#2167
- fixes kivy#2169
@ghost
Copy link

ghost commented May 9, 2020

There is now a ticket here: https://bugzilla.libsdl.org/show_bug.cgi?id=5129

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants