Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CommandBufferGL::drawElements Crash #1211

Closed
kiranb47 opened this issue May 24, 2023 · 138 comments · Fixed by #1504
Closed

CommandBufferGL::drawElements Crash #1211

kiranb47 opened this issue May 24, 2023 · 138 comments · Fixed by #1504
Labels
HelpDesk wontfix This will not be worked on
Milestone

Comments

@kiranb47
Copy link

  • axmol version: Latest dev branch
  • devices test on:
  • developing environments
    • NDK version: r19c
    • Xcode version: 12.4
    • Visual Studio:
      • VS version: 2019 (16.11), 2022 (17.4)
      • MSVC version: 1929, 1934
      • Windows SDK version: 10.0.22621.0
    • cmake version:
      Steps to Reproduce:

We have migrated our Match Animal game to Axmol from Cocos2dx V3 latest yesterday. Getting this crash now. How to fix this issue?

backtrace:
#00 pc 0x00000000005ffa58 /data/app/games.spearmint.matchanimal-ayOwq5oTYpeKYIRIPMWVUQ==/lib/arm64/libMatchAnimal.so (ax::Renderer::TriangleCommandBufferManager::createBuffer()+1681)
#1 pc 0x00000000005fdbc0 /data/app/games.spearmint.matchanimal-ayOwq5oTYpeKYIRIPMWVUQ==/lib/arm64/libMatchAnimal.so (ax::Renderer::init()+964)
#2 pc 0x000000000063b060 /data/app/games.spearmint.matchanimal-ayOwq5oTYpeKYIRIPMWVUQ==/lib/arm64/libMatchAnimal.so (ax::Director::setOpenGLView(ax::GLView*)+394)
#3 pc 0x00000000005640cc /data/app/games.spearmint.matchanimal-ayOwq5oTYpeKYIRIPMWVUQ==/lib/arm64/libMatchAnimal.so (Java_org_axmol_lib_AxmolRenderer_nativeInit+99)
#4 pc 0x00000000001c5ff0 /data/app/games.spearmint.matchanimal-ayOwq5oTYpeKYIRIPMWVUQ==/oat/arm64/base.odex

@rh101
Copy link
Contributor

rh101 commented May 25, 2023

How to fix this issue?

At risk of stating the obvious, have you actually attempted to use a debugger or other methods of tracking down why it crashes? A breakpoint in the right place may be the only thing you need to find out the issue. You've provided no source code, no context to the crash, whether it's on a real device or emulator, or what the app was doing at the time.

It's something in your own code causing this issue, and unlikely given to be something in the game engine, and given that you ported it from what I can only assume is a working version of the app, then something like this must be trivial for you to fix.

@kiranb47
Copy link
Author

Hey @rh101 The issue is not reproducible in any of our test devices. Crash logs are from Google Play live users. 3.6% of our GPLAY users have the issue.

@kiranb47
Copy link
Author

Another similar crash:

[split_config.armeabi_v7a.apk!libMatchAnimal.so] CommandBufferGL.cpp - ax::backend::CommandBufferGL::drawArrays(ax::backend::PrimitiveType, unsigned int, unsigned int, bool)
SIGSEGV
filter_list

date_range
Last 28 days
arrow_drop_down
Affected users
9
Last 28 days
Events
20
Last 28 days
Last occurred
3 hours ago
Last updated
Today, 10:30 AM
Affected users
Apr 27, 2023 - May 25, 2023
Affected users
Events
By app version
10205 (1.2.5)
By Android version
Android 11 (SDK 30)
Android 12 (SDK 31)
By device
samsung a10s
samsung a02
Redmi ice
TCL Cruze_Lite
By issue visibility
help_outline
Foreground
Stack trace
help_outline
1
Sample attributes:
samsung a10s (Galaxy A10s)

Android 11 (SDK 30)

Version: 10205 (1.2.5)Occurred: 3 hours ago


pid: 0, tid: 10195 >>> games.spearmint.matchanimal <<<

backtrace:
#00 pc 0x00000000000701ae /vendor/lib/egl/libGLESv2_mtk.so
#1 pc 0x0000000000089ca1 /vendor/lib/egl/libGLESv2_mtk.so
#2 pc 0x00000000000888d9 /vendor/lib/egl/libGLESv2_mtk.so (glDrawArrays+2592)
#3 pc 0x000000000002d549 /vendor/lib/egl/libGLES_meow.so (MEOW::meow_call_ddk_gl_2_glDrawArrays(unsigned int, int, int)+20)
#4 pc 0x00000000004255d7 /data/app/~~ZtkX28x-dzXdBYditxORXg==/games.spearmint.matchanimal-6vE2kmMTB0DCfes9iuUpeg==/split_config.armeabi_v7a.apk!libMatchAnimal.so (ax::backend::CommandBufferGL::drawArrays(ax::backend::PrimitiveType, unsigned int, unsigned int, bool)+222)
#5 pc 0x000000000040feef /data/app/~~ZtkX28x-dzXdBYditxORXg==/games.spearmint.matchanimal-6vE2kmMTB0DCfes9iuUpeg==/split_config.armeabi_v7a.apk!libMatchAnimal.so (ax::Renderer::drawCustomCommand(ax::RenderCommand*)+758)
#6 pc 0x000000000040ff3d /data/app/~~ZtkX28x-dzXdBYditxORXg==/games.spearmint.matchanimal-6vE2kmMTB0DCfes9iuUpeg==/split_config.armeabi_v7a.apk!libMatchAnimal.so (ax::Renderer::doVisitRenderQueue(std::__ndk1::vector<ax::RenderCommand*, std::__ndk1::allocatorax::RenderCommand*> const&)+398)
#7 pc 0x000000000040f7d5 /data/app/~~ZtkX28x-dzXdBYditxORXg==/games.spearmint.matchanimal-6vE2kmMTB0DCfes9iuUpeg==/split_config.armeabi_v7a.apk!libMatchAnimal.so (ax::Renderer::visitRenderQueue(ax::RenderQueue&)+386)
#8 pc 0x000000000040f539 /data/app/~~ZtkX28x-dzXdBYditxORXg==/games.spearmint.matchanimal-6vE2kmMTB0DCfes9iuUpeg==/split_config.armeabi_v7a.apk!libMatchAnimal.so (ax::Renderer::processGroupCommand(ax::GroupCommand*)+292)
#9 pc 0x000000000040f8e3 /data/app/~~ZtkX28x-dzXdBYditxORXg==/games.spearmint.matchanimal-6vE2kmMTB0DCfes9iuUpeg==/split_config.armeabi_v7a.apk!libMatchAnimal.so (ax::Renderer::processRenderCommand(ax::RenderCommand*)+342)
#10 pc 0x000000000040ff3d /data/app/~~ZtkX28x-dzXdBYditxORXg==/games.spearmint.matchanimal-6vE2kmMTB0DCfes9iuUpeg==/split_config.armeabi_v7a.apk!libMatchAnimal.so (ax::Renderer::doVisitRenderQueue(std::__ndk1::vector<ax::RenderCommand*, std::__ndk1::allocatorax::RenderCommand*> const&)+398)
#11 pc 0x000000000040f7d5 /data/app/~~ZtkX28x-dzXdBYditxORXg==/games.spearmint.matchanimal-6vE2kmMTB0DCfes9iuUpeg==/split_config.armeabi_v7a.apk!libMatchAnimal.so (ax::Renderer::visitRenderQueue(ax::RenderQueue&)+386)
#12 pc 0x000000000040ff8b /data/app/~~ZtkX28x-dzXdBYditxORXg==/games.spearmint.matchanimal-6vE2kmMTB0DCfes9iuUpeg==/split_config.armeabi_v7a.apk!libMatchAnimal.so (ax::Renderer::render()+415)
#13 pc 0x00000000003d64b1 /data/app/~~ZtkX28x-dzXdBYditxORXg==/games.spearmint.matchanimal-6vE2kmMTB0DCfes9iuUpeg==/split_config.armeabi_v7a.apk!libMatchAnimal.so (ax::Scene::render(ax::Renderer*, ax::Mat4 const&, ax::Mat4 const*)+225)
#14 pc 0x000000000043953d /data/app/~~ZtkX28x-dzXdBYditxORXg==/games.spearmint.matchanimal-6vE2kmMTB0DCfes9iuUpeg==/split_config.armeabi_v7a.apk!libMatchAnimal.so (ax::Director::drawScene()+295)
#15 pc 0x000000000043af85 /data/app/~~ZtkX28x-dzXdBYditxORXg==/games.spearmint.matchanimal-6vE2kmMTB0DCfes9iuUpeg==/split_config.armeabi_v7a.apk!libMatchAnimal.so (ax::Director::mainLoop()+1485)
#16 pc 0x00000000000ad5d7 /data/app/~~ZtkX28x-dzXdBYditxORXg==/games.spearmint.matchanimal-6vE2kmMTB0DCfes9iuUpeg==/oat/arm/base.odex (art_jni_trampolin

@rh101
Copy link
Contributor

rh101 commented May 25, 2023

Crash logs are from Google Play live users. 3.6% of our GPLAY users have the issue.

Is it limited to specific devices or Android versions, and if so, what devices and Android versions are having this issue?

I notice the armeabi_v7a in that dump. Is the crash limited to 32bit devices running armeabi_v7a?

Also, please put code tags around the on the crash dumps so they're formatted and easier to read. Just add 3 backtick ` symbols before and after the crash dumps.

@kiranb47
Copy link
Author

@rh101 Issue also in arm64_v8 devices.

[split_config.arm64_v8a.apk!libMatchAnimal.so] CommandBufferGL.cpp - ax::backend::CommandBufferGL::drawElements(ax::backend::PrimitiveType, ax::backend::IndexFormat, unsigned long, unsigned long, bool)

backtrace:
  #00  pc 0x0000000000088b78  /apex/com.android.runtime/lib64/bionic/libc.so (__memcpy+232)
  #01  pc 0x00000000001eb86c  /vendor/lib64/egl/libGLESv2_adreno.so (!!!0000!6642a709aefb598e3c7ff817eaf381!03dd3ba!+3524)
  #02  pc 0x0000000000360b08  /vendor/lib64/egl/libGLESv2_adreno.so (!!!0000!928ddd272828b5ecd302b519177d41!03dd3ba!+904)
  #03  pc 0x0000000000332b54  /vendor/lib64/egl/libGLESv2_adreno.so (!!!0000!4c38aea95be9faf0b3861d1af73f50!03dd3ba!+2916)
  #04  pc 0x0000000000320078  /vendor/lib64/egl/libGLESv2_adreno.so (!!!0000!591ab8a9d75351b2e63b06236cc5c1!03dd3ba!+16)
  #05  pc 0x0000000000133bfc  /vendor/lib64/egl/libGLESv2_adreno.so (!!!0000!6b200851123c7898055fe62ff9f71f!03dd3ba!+1876)
  #06  pc 0x000000000012be84  /vendor/lib64/egl/libGLESv2_adreno.so (!!!0000!77df12deb6a622478efa8fb9929034!03dd3ba!+1004)
  #07  pc 0x000000000061f8a8  /data/app/~~jlm2IhawQGELdTRWJwWbqw==/games.spearmint.matchanimal-oUvgtVnie2xt30tW0BFwmg==/split_config.arm64_v8a.apk!libMatchAnimal.so (ax::backend::CommandBufferGL::drawElements(ax::backend::PrimitiveType, ax::backend::IndexFormat, unsigned long, unsigned long, bool)+242)
  #08  pc 0x00000000005fee60  /data/app/~~jlm2IhawQGELdTRWJwWbqw==/games.spearmint.matchanimal-oUvgtVnie2xt30tW0BFwmg==/split_config.arm64_v8a.apk!libMatchAnimal.so (ax::Renderer::drawCustomCommand(ax::RenderCommand*)+752)
  #09  pc 0x00000000005fef04  /data/app/~~jlm2IhawQGELdTRWJwWbqw==/games.spearmint.matchanimal-oUvgtVnie2xt30tW0BFwmg==/split_config.arm64_v8a.apk!libMatchAnimal.so (ax::Renderer::doVisitRenderQueue(std::__ndk1::vector<ax::RenderCommand*, std::__ndk1::allocator<ax::RenderCommand*>> const&)+398)
  #10  pc 0x00000000005fe4b0  /data/app/~~jlm2IhawQGELdTRWJwWbqw==/games.spearmint.matchanimal-oUvgtVnie2xt30tW0BFwmg==/split_config.arm64_v8a.apk!libMatchAnimal.so (ax::Renderer::visitRenderQueue(ax::RenderQueue&)+386)
  #11  pc 0x00000000005fe27c  /data/app/~~jlm2IhawQGELdTRWJwWbqw==/games.spearmint.matchanimal-oUvgtVnie2xt30tW0BFwmg==/split_config.arm64_v8a.apk!libMatchAnimal.so (ax::Renderer::processGroupCommand(ax::GroupCommand*)+292)
  #12  pc 0x00000000005fe644  /data/app/~~jlm2IhawQGELdTRWJwWbqw==/games.spearmint.matchanimal-oUvgtVnie2xt30tW0BFwmg==/split_config.arm64_v8a.apk!libMatchAnimal.so (ax::Renderer::processRenderCommand(ax::RenderCommand*)+342)
  #13  pc 0x00000000005fef04  /data/app/~~jlm2IhawQGELdTRWJwWbqw==/games.spearmint.matchanimal-oUvgtVnie2xt30tW0BFwmg==/split_config.arm64_v8a.apk!libMatchAnimal.so (ax::Renderer::doVisitRenderQueue(std::__ndk1::vector<ax::RenderCommand*, std::__ndk1::allocator<ax::RenderCommand*>> const&)+398)
  #14  pc 0x00000000005fe4b0  /data/app/~~jlm2IhawQGELdTRWJwWbqw==/games.spearmint.matchanimal-oUvgtVnie2xt30tW0BFwmg==/split_config.arm64_v8a.apk!libMatchAnimal.so (ax::Renderer::visitRenderQueue(ax::RenderQueue&)+386)
  #15  pc 0x00000000005fef74  /data/app/~~jlm2IhawQGELdTRWJwWbqw==/games.spearmint.matchanimal-oUvgtVnie2xt30tW0BFwmg==/split_config.arm64_v8a.apk!libMatchAnimal.so (ax::Renderer::render()+415)
  #16  pc 0x00000000005b17b8  /data/app/~~jlm2IhawQGELdTRWJwWbqw==/games.spearmint.matchanimal-oUvgtVnie2xt30tW0BFwmg==/split_config.arm64_v8a.apk!libMatchAnimal.so (ax::Scene::render(ax::Renderer*, ax::Mat4 const&, ax::Mat4 const*)+225)
  #17  pc 0x000000000063a760  /data/app/~~jlm2IhawQGELdTRWJwWbqw==/games.spearmint.matchanimal-oUvgtVnie2xt30tW0BFwmg==/split_config.arm64_v8a.apk!libMatchAnimal.so (ax::Director::drawScene()+295)
  #18  pc 0x000000000063ca68  /data/app/~~jlm2IhawQGELdTRWJwWbqw==/games.spearmint.matchanimal-oUvgtVnie2xt30tW0BFwmg==/split_config.arm64_v8a.apk!libMatchAnimal.so (ax::Director::mainLoop()+1485)
  #19  pc 0x00000000000ada90  /data/app/~~jlm2IhawQGELdTRWJwWbqw==/games.spearmint.matchanimal-oUvgtVnie2xt30tW0BFwmg==/oat/arm64/base.odex (art_jni_trampoline+144)
  #20  pc 0x0000000002018c7c  /memfd:jit-cache (org.axmol.lib.AxmolRenderer.onDrawFrame+220)
  #21  pc 0x000000000069f968  /system/framework/arm64/boot-framework.oat (android.opengl.GLSurfaceView$GLThread.guardedRun+3992)
  #22  pc 0x00000000006a01b0  /system/framework/arm64/boot-framework.oat (android.opengl.GLSurfaceView$GLThread.run+224)
  #23  pc 0x0000000000134564  /apex/com.android.art/lib64/libart.so (art_quick_invoke_stub+548)
  #24  pc 0x0000000000198e94  /apex/com.android.art/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+204)
  #25  pc 0x0000000000532198  /apex/com.android.art/lib64/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+104)
  #26  pc 0x0000000000533398  /apex/com.android.art/lib64/libart.so (art::JValue art::InvokeVirtualOrInterfaceWithJValues<art::ArtMethod*>(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, art::ArtMethod*, jvalue const*)+440)
  #27  pc 0x00000000005808b8  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+1272)
  #28  pc 0x00000000000f40c4  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+64)
  #29  pc 0x000000000008ed10  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

@rh101
Copy link
Contributor

rh101 commented May 25, 2023

@kiranb47 Crashes that affect a small subset of users may have very specific triggers, and since you are the one with access to the source code, you're in the best position to narrow down the possible causes. Do you collect any form of diagnostic information from your application? Can you tell what a user is doing in the app just prior to the crash?

@solan-solan
Copy link
Contributor

According to provided logs, all your issues are related to gl buffers. The falt occurred inside gldrawelements, gldrawarrays and while buffer creating. For example, axmol supports int as base type for index array as opposed to cocos. I would check how your buffers creating in your code at first, since the issues looks logical. You created them manually?

@kiranb47
Copy link
Author

Hey @rh101 Firebase is not reporting this issue, so not able to identify pre crash events logs. Hey @solan-solan I have no technical knowledge about OpenGL. We have not applied any changes to axmol source code for this game.

@kiranb47 kiranb47 changed the title Android crash 1 CommandBufferGL::drawElements Crash Jul 4, 2023
@kiranb47
Copy link
Author

kiranb47 commented Jul 4, 2023

@rh101 @solan-solan Issue is related to UI ScrollView

@rh101
Copy link
Contributor

rh101 commented Jul 4, 2023

It's great that you have managed to narrow down the issue. Can you show the snippet of code related to how you use the ui::ScrollView? Include how you set it up/initialize it, and any other related code. If you could create a small project with just the ui::ScrollView usage then that would help.

@crazyhappygame
Copy link
Contributor

FYI. I migrated my app from cocos2d-x 3.17 to latest axmol (4e664e6) and see in Google Play similar crashes in drawElements and drawArrays

ax::backend::CommandBufferGL::drawElements(ax::backend::PrimitiveType, ax::backend::IndexFormat, unsigned long, unsigned long, bool)

and

ax::backend::CommandBufferGL::drawArrays(ax::backend::PrimitiveType, unsigned long, unsigned long, b

In my game I use ui::ScrollView as well. I used many (100-1000) Sprite and Scale9Sprite.
@kiranb47 how did you narrow problem to ui::ScrollView ?

App with cocos2d-x 3.17 has ~0 crashes after upgrading I see significant increase. I can not reproduce issue on local devices ...

Please check below screenshot how much crashes increased in Google Play after update:
image

@DelinWorks
Copy link
Contributor

DelinWorks commented Jul 6, 2023

is your project open source? I can debug and potentially fix this for you if so. otherwise why not compile a small project that showcases the issue like @rh101 said?

@crazyhappygame
Copy link
Contributor

@DelinWorks Thank you for an offer.

The problem is that I can not reproduce problem locally ... I see crashes only in Google Play. I am so sorry my project is not open source.
Based on above callstacks it looks like we crash somewhere here

void CommandBufferGL::drawElements(PrimitiveType primitiveType,
                                   IndexFormat indexType,
                                   std::size_t count,
                                   std::size_t offset,
                                   bool wireframe)
{
    prepareDrawing();
#ifndef AX_USE_GLES  // glPolygonMode is only supported in Desktop OpenGL
    if (wireframe) glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);
#else
    if (wireframe) primitiveType = PrimitiveType::LINE;
#endif
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, _indexBuffer->getHandler());
    glDrawElements(UtilsGL::toGLPrimitiveType(primitiveType), count, UtilsGL::toGLIndexType(indexType),
                   (GLvoid*)offset);

That could mean that _indexBuffer is incorrect or parameters to glDrawElements are incorrect.

Unfortunately I am not familiar with GL. Do you think that it would be possible having above callstack analyze existing code and narrow problem through code inspection?
@DelinWorks

@DelinWorks
Copy link
Contributor

You were able to produce it on your device or is that a google analysis ghost crash

@DelinWorks
Copy link
Contributor

DelinWorks commented Jul 6, 2023

if google analysis could show you which population of devices it crashes on it would be helpful

@crazyhappygame
Copy link
Contributor

crazyhappygame commented Jul 6, 2023

@DelinWorks I can not reproduce crash on my device. I see crash only in Google Play console.
In Google Play console I see crash on following devices:
OPPO OP56F5 (A17) Android 12 (SDK 31)
Redmi lemon (Redmi 9T) Android 10 (SDK 29)
samsung a70q (Galaxy A70) Android 11 (SDK 30)
samsung starlte (Galaxy S9) Android 10 (SDK 29)

FYI. This game is was updated 3 days ago and only ~2K users updated version.

@DelinWorks
Copy link
Contributor

unfortunately you'll need to get your hands on one of those devices to test it, or try to roughly match the specs, os version, and sdk of the devices to hopefully regenerate the bug.

@rh101
Copy link
Contributor

rh101 commented Jul 6, 2023

@crazyhappygame @kiranb47

Is there any chance you could create a debug version of your applications, and upload them to either the "Internal testing" track in your Google Play Console? I can provide you with a gmail address to add to the internal tester list, and I suggest others here provide an email address too if they can help (best to generate a new gmail address for this purpose, and not use your personal address). Once you upload the APK to that internal testing track, you can share the link to it so that the users on your testers list may download it.

This issue has me a little concerned, and I'm going to try to get my hands on at least one of the specific devices that seem to be having this issue. Also, describe exactly what it is that needs to be done in the application to cause it to crash (like which screen to go to, what exactly to click on etc).

Also, @crazyhappygame which of the devices that you listed have the most crashes?

Another thing, we really need a sample of the ui::ScrollView usage. We don't need an entire app source code, just any code related to how the ui::ScrollView is being created and used. If you can extract that section of code from your apps, remove/rename any private/proprietary info, and provide that, then it may help others replicate this issue.

@solan-solan
Copy link
Contributor

@crazyhappygame

I used many (100-1000) Sprite and Scale9Sprite.
@kiranb47 how did you narrow problem to ui::ScrollView ?

May be this is some type of memory leack? Do you sure that gl buffers and internal arrays of your sprites properly free? And is there opportunity to analysis how many time user spent in game before crashing?

@crazyhappygame
Copy link
Contributor

@solan-solan I used just Sprite and Scale9Sprite with image. No custom tricks, shaders etc. Not sure how to get information how much time user spend in the app before crash....

What I have seen in stacktrace that it crashes in memcpy. For me that means that there is no problem with allocation but with access to unavailable memory.

@rh101 @DelinWorks @kiranb47 do you know how to build entire (all 3rdparty libs, axmol, gsme) axmol with address sanitizer? https://developer.android.com/ndk/guides/asan#cmake
? I have strong feeling that it can solve this mystery.

@crazyhappygame
Copy link
Contributor

@rh101 could you send me instructions how to build debug version of app and Gmail address? (Gradle assembleDebug)?

@rh101
Copy link
Contributor

rh101 commented Jul 8, 2023

@rh101 could you send me instructions how to build debug version of app and Gmail address? (Gradle assembleDebug)?

If you're using Android Studio, then just use the menus to configure it:
image

and this window pops up:
image

There may be a way to do it from the command line too, but I can't recall it at the moment, so perhaps someone else knows how to do it.

EDIT: email removed. @crazyhappygame if you need it again, I'll post it up.

@rh101
Copy link
Contributor

rh101 commented Jul 8, 2023

@rh101 @DelinWorks @kiranb47 do you know how to build entire (all 3rdparty libs, axmol, gsme) axmol with address sanitizer? https://developer.android.com/ndk/guides/asan#cmake

Are the instructions on that page not working for you?

@crazyhappygame
Copy link
Contributor

Below have to be set for all targets including all 3rdparty libs I do not know how to rebuild all android deps

set_target_properties(${TARGET} PROPERTIES LINK_FLAGS -fsanitize=address)```

@stale
Copy link

stale bot commented Sep 6, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Sep 6, 2023
@stale stale bot removed the wontfix This will not be worked on label Sep 11, 2023
@rh101
Copy link
Contributor

rh101 commented May 19, 2024

@battulasaivinesh Does your application have any full-screen video ads (using an ad SDK etc.) or similar that would cause the app to be put into the background then back to the foreground on completion?

Also, what percentage of users/devices are experiencing this specific crash?

@battulasaivinesh
Copy link

Yes, we do have rewarded video ads at multiple places in-game. We use Google Ads SDK for the same. Currently, we just rolled out axmol engine change to 100% yesterday. Out of the total crashes happening with this release (Around 7k in the last 24 hrs), drawElements crashes take almost 25% (2k). In terms of total users though, this crash is happening for less the 0.5% of users.

@halx99
Copy link
Collaborator

halx99 commented May 19, 2024

So, just putting it out there, is there any chance that the call to glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0); is still required, as per the comment for CC_REBIND_INDICES_BUFFER/AX_REBIND_INDICES_BUFFER?

At here: https://github.com/axmolengine/axmol/blob/dev/core/renderer/backend/opengl/CommandBufferGL.cpp#L255

Yes, but I was wondering if the call to glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0); was required too.

maybe somebody can change to test

@smilediver
Copy link
Contributor

Another potential issue is that CommandBufferGL::bindVertexBuffer() binds buffer only if vertex format is valid. There are no error reporting up the call stack, so it would keep drawing with the previously bound buffer.

void CommandBufferGL::bindVertexBuffer(uint32_t& usedBits) const
{
// Bind vertex buffers and set the attributes.
auto vertexLayout = _programState->getVertexLayout();
const auto& attributes = vertexLayout->getAttributes();
if (!vertexLayout->isValid())
return;
// Bind VAO, engine share 1 VAO for all vertexLayouts aka vfmts
// optimize proposal: create VAO per vertexLayout, just need bind VAO
__gl->bindBuffer(BufferType::ARRAY_BUFFER, _vertexBuffer->getHandler());
for (const auto& attributeInfo : attributes)
{
const auto& attribute = attributeInfo.second;
__gl->enableVertexAttribArray(attribute.index);
glVertexAttribPointer(attribute.index, UtilsGL::getGLAttributeSize(attribute.format),
UtilsGL::toGLAttributeType(attribute.format), attribute.needToBeNormallized,
vertexLayout->getStride(), (GLvoid*)attribute.offset);
// non-instance attrib not use divisor, so clear to 0
__gl->clearVertexAttribDivisor(attribute.index);
usedBits |= (1 << attribute.index);
}
}

@rh101
Copy link
Contributor

rh101 commented May 19, 2024

Yes, we do have rewarded video ads at multiple places in-game. We use Google Ads SDK for the same. Currently, we just rolled out axmol engine change to 100% yesterday. Out of the total crashes happening with this release (Around 7k in the last 24 hrs), drawElements crashes take almost 25% (2k). In terms of total users though, this crash is happening for less the 0.5% of users.

The reason I asked is because this issue has existed since Cocos2d-x v4 came out, as that's when the OpenGL renderer changed significantly from Cocos2d-x v3.17. This issue was posted about here on the Cocos forums: https://discuss.cocos2d-x.org/t/commandbuffergl-related-crash/50293/14

Check if any info there regarding full-screen ads helps at all, because it was the cause of that specific developer's problem.

@rh101
Copy link
Contributor

rh101 commented May 19, 2024

Another potential issue is that CommandBufferGL::bindVertexBuffer() binds buffer only if vertex format is valid. There are no error reporting up the call stack, so it would keep drawing with the previously bound buffer.

That is a problem that needs to be fixed, whether or not it is the cause of the crash described in this thread.

@battulasaivinesh
Copy link

@rh101 Thanks for the link. All the callbacks from ads sdk are wrapped around performFunctionInCocosThread, so not the cause of the crash. We have a release tomorrow, should I try adding back glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);?

@rh101
Copy link
Contributor

rh101 commented May 19, 2024

@rh101 Thanks for the link. All the callbacks from ads sdk are wrapped around performFunctionInCocosThread, so not the cause of the crash.

Fair enough.

We have a release tomorrow, should I try adding back glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);?

That's entirely up to you.

@battulasaivinesh
Copy link

As suggested by @rarepixels here - #1211 (comment), simply setting "Preserve EGL context" to false (this.mGLSurfaceView.setPreserveEGLContextOnPause(false)) and commenting out //super.onPause(); completely eliminated all crashes related to drawing elements.

I'm not entirely certain if this is the correct approach, but Cocos also implemented the same solution - cocos2d/cocos2d-x#19996.

@rh101
Copy link
Contributor

rh101 commented May 29, 2024

As suggested by @rarepixels here - #1211 (comment), simply setting "Preserve EGL context" to false (this.mGLSurfaceView.setPreserveEGLContextOnPause(false)) and commenting out //super.onPause(); completely eliminated all crashes related to drawing elements.

Calling this.mGLSurfaceView.setPreserveEGLContextOnPause(false) should be completely fine, especially if your app has enabled auto-restarting on context loss by AX_ENABLE_RESTART_APPLICATION_ON_CONTEXT_LOST, so if that fixes the issue, then great!

One thing that would need to be looked into is any side-effect of commenting out //super.onPause();.

@rh101
Copy link
Contributor

rh101 commented May 29, 2024

@battulasaivinesh Looking more into the AxmolGLSurfaceView.onPause(), if super.onPause(); is commented out, then it means the renderer is never paused when the app is in the background, and while the app may not crash, it will be using up resources, and may eventually crash on multiple background/foreground switches.

The other issue is that ``AxmolGLSurfaceView.onResume()is still called regardless, which then callssuper.onResume()`, and ends up in this:

public void onResume() {
    synchronized (sGLThreadManager) {
        if (LOG_PAUSE_RESUME) {
            Log.i("GLThread", "onResume tid=" + getId());
        }
        mRequestPaused = false;
        mRequestRender = true;
        mRenderComplete = false;
        sGLThreadManager.notifyAll();
        while ((! mExited) && mPaused && (!mRenderComplete)) {
            if (LOG_PAUSE_RESUME) {
                Log.i("Main thread", "onResume waiting for !mPaused.");
            }
            try {
                sGLThreadManager.wait();
            } catch (InterruptedException ex) {
                Thread.currentThread().interrupt();
            }
        }
    }
}

The render was never actually paused, so that loop would still be running, and now we end up in another while loop every time the app is put into the background and then brought back to the foreground.

I feel that the logic should be improved in the main activity, being AxmolActivity.

Activity.onPause() is called when an activity is still visible, but no longer has focus, so technically the renderer should still be active and rendering. Only when the activity is put in the background, and Activity.onStop() is called, should the renderer actually stop, meaning AxmolGLSurfaceView.onPause() should be called then. A remark is made about this in the Android documentation for GLSurfaceView here to the same effect.

So, perhaps the code in AxmolActivity should be updated to this:

   private boolean rendererPaused = true;

   private void resume() {
        this.hideVirtualButton();
        AxmolEngine.onResume();
        if (rendererPaused) {
            mGLSurfaceView.onResume();
            rendererPaused = false;
        }
        mGLSurfaceView.setRenderMode(GLSurfaceView.RENDERMODE_CONTINUOUSLY);
    }

    @Override
    protected void onPause() {
    	Log.d(TAG, "onPause()");
        paused = true;
        super.onPause();
        AxmolEngine.onPause();
        mGLSurfaceView.setRenderMode(GLSurfaceView.RENDERMODE_WHEN_DIRTY);
    }

    @Override
    protected void onStop() {
        super.onStop();
        rendererPaused = true;
        mGLSurfaceView.onPause();
    }

So, essentially what happens is that if the app no longer has focus, Activity.onPause() is called, but it is still visible to the user, the renderer is set to RENDERMODE_WHEN_DIRTY. We do not pause the GLSurfaceView just yet (as per the Android documentation). If the user clicks back into the app, then we set the render mode back to RENDERMODE_CONTINUOUSLY, and we do not call GLSurfaceView.onResume(), since the GLSurfaceView was never paused to begin with.

Now, if the user has navigated away from the app, or closed it somehow, then Activity.onStop() would be called, This is when GLSurfaceView.onPause() should be called, at which point the renderer loop is no longer running, and the EGL context may also be destroyed.

If the app is brought back to the foreground, then Activity.onResume() is called, and since the GLSurfaceView was in fact paused, then it will call AxmolGLSurfaceView.onResume(), which would be correct this time, and not end up creating multiple while loops doing the exact same thing (if I am correctly understanding what GLSurfaceView.onResume() is doing).

I tested a few non-Axmol apps to see what happens when "switch apps" button is clicked (the square button). All the apps simply reduced in size, and lost focus, but they never stopped animating/running, so their render loops were in fact still operating (Activity.onPause() is called), but notGLSurfaceView.onPause()). Once I navigated away from the apps did they stop animating etc., so that most likely means GLSurfaceView.onPause() was called from Activity.onStop().

I'm not sure if this would actually fix any issue, but the primary purpose of this change is to stop the renderer only when required, which is when the current app activity is replaced by another activity, and there is no point destroying the EGL context if the app never actually goes into the background.

@smilediver
Copy link
Contributor

For starters it would be good to check and fix this issue: #1211 (comment). If no one checks it, I'll probably have some time to look into it this or next week.

@rh101
Copy link
Contributor

rh101 commented May 29, 2024

As suggested by @rarepixels here - #1211 (comment), simply setting "Preserve EGL context" to false (this.mGLSurfaceView.setPreserveEGLContextOnPause(false)) and commenting out //super.onPause(); completely eliminated all crashes related to drawing elements.

One additional thing regarding this apparent fix. It's not actually fixing the issue, but rather avoiding the issue. If GLSurfaceView.onPause() is not called, then the context is never removed, regardless of whether setPreserveEGLContextOnPause is true or false. You can see that here in GLSurfaceView.java:

private void guardedRun() throws InterruptedException {
...
        while (true) {
...
                            // When pausing, release the EGL surface:
                            if (pausing && mHaveEglSurface) {
                                if (LOG_SURFACE) {
                                    Log.i("GLThread", "releasing EGL surface because paused tid=" + getId());
                                }
                                stopEglSurfaceLocked();
                            }

                            // When pausing, optionally release the EGL Context:
                            if (pausing && mHaveEglContext) {
                                GLSurfaceView view = mGLSurfaceViewWeakRef.get();
                                boolean preserveEglContextOnPause = view == null ?
                                        false : view.mPreserveEGLContextOnPause;
                                if (!preserveEglContextOnPause) {
                                    stopEglContextLocked();
                                    if (LOG_SURFACE) {
                                        Log.i("GLThread", "releasing EGL context because paused tid=" + getId());
                                    }
                                }
                            }
...
        }
}

Since super.onPause() is commented out so that the GLSurfaceView.onPause() is never called, then it is never paused, so it will never enter the block that even checks for preserveEglContextOnPause.

Why it's no longer crashing is because the EGL context is never released, since the GLSurfaceView is never paused, but as I mentioned in the previous post, this may have side-effects.

@rh101
Copy link
Contributor

rh101 commented May 29, 2024

Just discovered a problem that may affect anyone who changes the enabled state of EventDispatcher in AppDelegate::applicationDidEnterBackground() and AppDelegate::applicationWillEnterForeground(). If the event dispatcher is disabled in applicationDidEnterBackground(), then upon entering this method:

JNIEXPORT void JNICALL Java_org_axmol_lib_AxmolRenderer_nativeInit(JNIEnv*, jclass, jint w, jint h)
{
    auto director = ax::Director::getInstance();
    auto glView   = director->getGLView();
    if (!glView)
    {
        glView = ax::GLViewImpl::create("Android app");
        glView->setFrameSize(w, h);
        director->setGLView(glView);

        ax::Application::getInstance()->run();
    }
    else
    {
        backend::DriverBase::getInstance()->resetState();
        ax::Director::getInstance()->resetMatrixStack();
        ax::EventCustom recreatedEvent(EVENT_RENDERER_RECREATED);
        director->getEventDispatcher()->dispatchEvent(&recreatedEvent);
        director->setGLDefaultValues();
#if AX_ENABLE_CACHE_TEXTURE_DATA
        ax::VolatileTextureMgr::reloadAllTextures();
#endif
    }
}

The event for EVENT_RENDERER_RECREATED is never dispatched, because the event handler is still disabled.

That is assuming that Axmol has not been modified, meaning GLSurfaceView.onPause() is still called (super.onPause() is not commented out in AxmolGLSurfaceView.onPause()). This means all EVENT_RENDERER_RECREATED listeners will not be called to run whatever logic is required on an EGL context loss, so data belonging to the destroyed context is still being access, which will result in a crash.

I'm curious to know if the developers experiencing the crash in their apps are actually disabling the event handler at any point when the application is paused. @battulasaivinesh @rarepixels @kiranb47 @crazyhappygame Can you please check if you have code that disables the event handler on an application pause, such as in AppDelegate::applicationDidEnterBackground(), and note if you do or not do.

You can easily check this by doing the following:

Make sure AX_ENABLE_CACHE_TEXTURE_DATA is equal to 1.

In AxmolActivity.java, set preserve context to false: this.mGLSurfaceView.setPreserveEGLContextOnPause(false);

Put a breakpoint on Java_org_axmol_lib_AxmolRenderer_nativeInit, and when it gets to the following line:
director->getEventDispatcher()->dispatchEvent(&recreatedEvent);

Check if it is in fact correctly dispatching the event. Alternatively, put a breakpoint on any handler function for EVENT_RENDERER_RECREATED, and see if it gets called.

Now, navigate away from your application to another app so it goes into the background, then go back to the same application again. If it does not enter any handlers for EVENT_RENDERER_RECREATED, then that may actually be the cause of the all crashes.

@smilediver
Copy link
Contributor

It starts in the disabled state too, which might be already problematic:

EventDispatcher::EventDispatcher() : _inDispatch(0), _isEnabled(false), _nodePriorityIndex(0)

TransitionScene class disables events.

Looks like these events should ignore disabled dispatcher:

  • EVENT_COME_TO_FOREGROUND
  • EVENT_COME_TO_BACKGROUND
  • EVENT_RENDERER_RECREATED
  • EVENT_APP_RESTARTING

I think Event class could be extended to support some sort of "undisableable" and "unstoppable" events.

@rh101
Copy link
Contributor

rh101 commented May 29, 2024

It starts in the disabled state too, which might be already problematic:

EventDispatcher::EventDispatcher() : _inDispatch(0), _isEnabled(false), _nodePriorityIndex(0)

TransitionScene class disables events.

That's actually interesting, because given the low percentage of affected users, it's most likely a very specific set of conditions that is triggering it. This scenario is most likely to cause a crash:

1 - event dispatcher is disabled (such as on entering a scene transition)
2 - the app is moved to the background for any reason
3 - the app is then moved back to the foreground
4 - if the EGL context is lost, then it will attempt to recover/restart depending on the configuration (AX_ENABLE_CACHE_TEXTURE_DATA == 1 or AX_ENABLE_RESTART_APPLICATION_ON_CONTEXT_LOST == 1)
5 - since the event dispatcher is still disabled, EVENT_RENDERER_RECREATED (or EVENT_APP_RESTARTING) would never be sent out

I can reproduce this issue 100% of the time with a test case, by setting preserve EGL context to false with this.mGLSurfaceView.setPreserveEGLContextOnPause(false);.

Looks like these events should ignore disabled dispatcher:

* `EVENT_COME_TO_FOREGROUND`

* `EVENT_COME_TO_BACKGROUND`

* `EVENT_RENDERER_RECREATED`

* `EVENT_APP_RESTARTING`

I think Event class could be extended to support some sort of "undisableable" and "unstoppable" events.

Something does need to be done about this.

@rh101
Copy link
Contributor

rh101 commented May 29, 2024

For starters it would be good to check and fix this issue: #1211 (comment). If no one checks it, I'll probably have some time to look into it this or next week.

With the suggested changes in my previous post above, the code would no longer be required, since applicationDidEnterBackground() and applicationWillEnterForeground() should not be called on app start-up or at incorrect times any more, and neither should have double calls.

@solan-solan
Copy link
Contributor

Why not to enable event dispatcher forcibly inside Java_org_axmol_lib_AxmolRenderer_nativeInit before EVENT_RENDERER_RECREATED?

@rh101
Copy link
Contributor

rh101 commented May 30, 2024

Why not to enable event dispatcher forcibly inside Java_org_axmol_lib_AxmolRenderer_nativeInit before EVENT_RENDERER_RECREATED?

That would require code to save the current state of the event dispatcher, enable it, then reset it to the previous state, and this would also be required on each and every event that must be sent out, such as EVENT_RENDERER_RECREATED.

Aside from create a better solution for these types of events, a quick fix would be to just add an optional parameter to all dispatchEvent() methods to indicate whether to force the event to be sent out, regardless of the event dispatcher enabled state. For example, a parameter named forced, which defaults to false:

void dispatchEvent(Event* event, bool forced = false);
void dispatchCustomEvent(std::string_view eventName, void* optionalUserData = nullptr, bool forced = false);
void EventDispatcher::dispatchEvent(Event* event, bool forced)
{
    if (!_isEnabled && !forced)
        return;
...
}

Checking through the rest of the code in the dispatcher, nothing checks for the _isEnabled flag, so that should work.

This change should not impact existing usage of the dispatch methods. Does it seem like a reasonable change?

@halx99
Copy link
Collaborator

halx99 commented May 30, 2024

Why not to enable event dispatcher forcibly inside Java_org_axmol_lib_AxmolRenderer_nativeInit before EVENT_RENDERER_RECREATED?

That would require code to save the current state of the event dispatcher, enable it, then reset it to the previous state, and this would also be required on each and every event that must be sent out, such as EVENT_RENDERER_RECREATED.

Aside from create a better solution for these types of events, a quick fix would be to just add an optional parameter to all dispatchEvent() methods to indicate whether to force the event to be sent out, regardless of the event dispatcher enabled state. For example, a parameter named forced, which defaults to false:

void dispatchEvent(Event* event, bool forced = false);
void dispatchCustomEvent(std::string_view eventName, void* optionalUserData = nullptr, bool forced = false);
void EventDispatcher::dispatchEvent(Event* event, bool forced)
{
    if (!_isEnabled && !forced)
        return;
...
}

Checking through the rest of the code in the dispatcher, nothing checks for the _isEnabled flag, so that should work.

This change should not impact existing usage of the dispatch methods. Does it seem like a reasonable change?

lgtm, PR?

@rh101
Copy link
Contributor

rh101 commented May 30, 2024

lgtm, PR?

Done, #1940

With this change, having AX_ENABLE_CACHE_TEXTURE_DATA enabled should actually work to recover the texture data after an EGL context loss. There will be no need to use AX_ENABLE_RESTART_APPLICATION_ON_CONTEXT_LOST in such cases, unless you have runtime generated textures that cannot be reloaded, and you don't want to recreate them on a EVENT_RENDERER_RECREATED event. That's really the only difference, so if you can recreate such textures, then handling the EVENT_RENDERER_RECREATED event would be the better option than having the entire application restart.

It would be great if anyone experiencing the crash issue in their released application could test this out to verify if it in fact fixes the problem. All required events are now sent out correctly even if the event dispatcher is disabled.

@rarepixels
Copy link

Hi all, I got all the new messages about this issue just now. I was away from my computer last weeks.

I can tell that I set //super.onPause(); since months, and yes, as @rh101 pointed, this do not fix, but, instead avoid the problem.

but for my two live apps, I can tell that I had no secondary effects at all, at least, no new crashes or ANR.

I hope I will finish a new project soon, and then I will check how things are going with all the new changes on production. ( I still never saw this crash locally )

Copy link

stale bot commented Aug 5, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Aug 5, 2024
@stale stale bot closed this as completed Aug 13, 2024
@rh101
Copy link
Contributor

rh101 commented Sep 22, 2024

Hi @rh101 , @crazyhappygame ,

I will wait until the end of the week, and clean all other changes I did in the engine to try to pinpoint the problem. (like disable the setPreserveEGLContextOnPause) and upload a new version just with the changes in the AxmolRenderer.java, and if I still see no crashes in the render loop, I will do a PR.

But @crazyhappygame, if you wanna try before, a part of other changes I did to clean the if/else, you can just move up this line in axmol/core/platform/android/java/src/org/axmol/lib/AxmolRenderer.java changes

change sleep & render for render & sleep

sleep / background / Context Lost / render ~ crash
render / sleep / background / Context Lost ~ next render loop will have the textures again

It turns out there is an issue with this specific change, which is now fixed in #2162

Along with that fix, there have been many other fixes related to crashes due to the EGL context loss issue. The current dev branch (future 2.2 release) should have, hopefully, addressed the crash issues completely.

@halx99 halx99 modified the milestones: LongTerms, 2.2 Sep 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HelpDesk wontfix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.