Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[main & 2.20] Windows not rendering, black rectangles, ... #3792

Closed
AlanGriffiths opened this issue Mar 3, 2025 · 7 comments · Fixed by #3793 or #3800
Closed

[main & 2.20] Windows not rendering, black rectangles, ... #3792

AlanGriffiths opened this issue Mar 3, 2025 · 7 comments · Fixed by #3793 or #3800
Assignees

Comments

@AlanGriffiths
Copy link
Collaborator

Running Miriway from both edge and beta gives a lot of rendering artefacts:

  • yambar is black unless cursor is moving over it
  • A lot of dialogs & popups are black until resized
  • synapse appears with a flickering black rectangle around it

I've eliminate Miriway changes as a cause by building Miriway/main locally (which picks up Mir 2.19.3 from ppa:mir-team/release)

(Running dual external monitors via HDMI/usb-c)

@AlanGriffiths AlanGriffiths self-assigned this Mar 3, 2025
@AlanGriffiths
Copy link
Collaborator Author

And... reverting #3784 fixes it

github-merge-queue bot pushed a commit that referenced this issue Mar 4, 2025
More complex than #3784, but
doesn't have unwanted side-effect

Fixes:  #3792
@AlanGriffiths
Copy link
Collaborator Author

AlanGriffiths commented Mar 4, 2025

Not sure where I went wrong last time around, nor why my "fix" seemed to fix it. But I've now tracked this down to the `Miroil" changes (#3767). That is a lot more plausible than the lifetime hacks...

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
960ea657dccce140ffd2699674f3680ad9e5e431
3425b4b7035959c8f1996370f3b47dbea1101bb2
8b48e1305ae3ee24cae15a4b7df46f3fc37d7f3c
a01417e4d47172fffc02bd7fdaaea5612989a682
We cannot bisect more!

@AlanGriffiths
Copy link
Collaborator Author

AlanGriffiths commented Mar 5, 2025

More notes on this:

  • I'm seeing this on my Framework (Ubuntu 24.04/amdgpu) laptop
    • Reproduce with main and miral-app
    • also with just the internal display
  • I don't see this on my X1C3 (Fedora 42/i915) laptop
  • I don't see this on my ProX15 (Ubuntu 24.04/i915) laptop

Obvious suspicion: we're doing something that upsets the amdgpu driver

@AlanGriffiths
Copy link
Collaborator Author

And... MESA_DEBUG is saying:

Mesa: error: GL_INVALID_ENUM in glGetIntegerv(pname=GL_TEXTURE_2D)

Which comes from here (commenting out the glGetIntegerv(GL_TEXTURE_2D, &previous_texture); call suppresses the error):

            // Paranoia: Save the current value for the GL state that we're modifying...
            GLint previous_texture;
            glGetIntegerv(GL_TEXTURE_2D, &previous_texture);

I think we should zero initialise previous_texture and test it before restoring the state...

diff --git a/src/platforms/common/server/shm_buffer.cpp b/src/platforms/common/server/shm_buffer.cpp
index 7459edd353..3f9caeb72e 100644
--- a/src/platforms/common/server/shm_buffer.cpp
+++ b/src/platforms/common/server/shm_buffer.cpp
@@ -110,7 +110,7 @@ auto get_tex_id_on_context(mgc::EGLContextExecutor& egl_executor) -> std::shared
             glGenTextures(1, &tex);
 
             // Paranoia: Save the current value for the GL state that we're modifying...
-            GLint previous_texture;
+            GLint previous_texture = 0;
             glGetIntegerv(GL_TEXTURE_2D, &previous_texture);
 
             glBindTexture(GL_TEXTURE_2D, tex);
@@ -121,7 +121,7 @@ auto get_tex_id_on_context(mgc::EGLContextExecutor& egl_executor) -> std::shared
             glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
 
             // ...and then restore the previous GL state.
-            glBindTexture(GL_TEXTURE_2D, static_cast<GLuint>(previous_texture));
+            if (previous_texture) glBindTexture(GL_TEXTURE_2D, static_cast<GLuint>(previous_texture));
             tex_promise->set_value(tex);
         });
     return tex_promise->get_future();

...but that doesn't seem to be the cause of the rendering problems

@AlanGriffiths
Copy link
Collaborator Author

Reverting the changes to ShmBuffer::bind() does fix the rendering problems...

diff --git a/src/platforms/common/server/shm_buffer.cpp b/src/platforms/common/server/shm_buffer.cpp
index 3f9caeb72e..4fdeeffa2c 100644
--- a/src/platforms/common/server/shm_buffer.cpp
+++ b/src/platforms/common/server/shm_buffer.cpp
@@ -222,7 +222,21 @@ mg::NativeBufferBase* mgc::ShmBuffer::native_buffer_base()
 
 void mgc::ShmBuffer::bind()
 {
-    glBindTexture(GL_TEXTURE_2D, tex_id());
+    std::lock_guard lock{tex_id_mutex};
+    bool const needs_initialisation = tex_id_ == 0;
+    if (needs_initialisation)
+    {
+        glGenTextures(1, &tex_id_);
+    }
+    glBindTexture(GL_TEXTURE_2D, tex_id_);
+    if (needs_initialisation)
+    {
+        // The ShmBuffer *should* be immutable, so we can just upload once.
+        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
+        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
+        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
+        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
+    }
 }
 
 auto mgc::ShmBuffer::tex_id() const -> GLuint
diff --git a/src/platforms/common/server/shm_buffer.h b/src/platforms/common/server/shm_buffer.h
index 1228698216..731b4a25d0 100644
--- a/src/platforms/common/server/shm_buffer.h
+++ b/src/platforms/common/server/shm_buffer.h
@@ -71,6 +71,9 @@ private:
     MirPixelFormat const pixel_format_;
     std::shared_ptr<EGLContextExecutor> const egl_delegate;
     std::shared_future<GLuint> const tex;
+
+    std::mutex tex_id_mutex;
+    GLuint tex_id_{0};
 };
 
 class MemoryBackedShmBuffer :

...although that leaves ShmBuffer in a pretty weird state of mixed implementations. (Not sure how to test the QtMir support)

@RAOF
Copy link
Contributor

RAOF commented Mar 6, 2025

Oh, no. This is an amdgpu driver bug - if you run with the mesa_glthread=false environment variable set you'll find everything works.

@RAOF
Copy link
Contributor

RAOF commented Mar 6, 2025

Oh, no. It's not an amdgpu driver bug, it's just that amdgpu is stricter with threading than other drivers.

@AlanGriffiths can you check the linked branch fixes this for you?

github-merge-queue bot pushed a commit that referenced this issue Mar 6, 2025
Because we're on an `EGLContextExecutor` we're probably on a different
thread to where the GL state is going to be used, *and* we don't
have implicit flushing with `eglMakeCurrent` happening (because
the context just stays current on the `EGLContextExecutor`.

If the GL implementation has per-thread execution queues (for example,
amdgpu by default), this might mean that the texture setup commands
aren't visible to command stream that's actually using the
texture.

Explicitly `glFlush()` after our texture setup, to ensure
these commands are visible to any `EGLContext` that might
need them.

Closes: #3792
Saviq pushed a commit that referenced this issue Mar 6, 2025
Because we're on an `EGLContextExecutor` we're probably on a different
thread to where the GL state is going to be used, *and* we don't
have implicit flushing with `eglMakeCurrent` happening (because
the context just stays current on the `EGLContextExecutor`.

If the GL implementation has per-thread execution queues (for example,
amdgpu by default), this might mean that the texture setup commands
aren't visible to command stream that's actually using the
texture.

Explicitly `glFlush()` after our texture setup, to ensure
these commands are visible to any `EGLContext` that might
need them.

Closes: #3792
Saviq added a commit that referenced this issue Mar 6, 2025
More complex than #3784, but
doesn't have unwanted side-effect

Fixes:  #3792
Saviq pushed a commit that referenced this issue Mar 6, 2025
Because we're on an `EGLContextExecutor` we're probably on a different
thread to where the GL state is going to be used, *and* we don't
have implicit flushing with `eglMakeCurrent` happening (because
the context just stays current on the `EGLContextExecutor`.

If the GL implementation has per-thread execution queues (for example,
amdgpu by default), this might mean that the texture setup commands
aren't visible to command stream that's actually using the
texture.

Explicitly `glFlush()` after our texture setup, to ensure
these commands are visible to any `EGLContext` that might
need them.

Closes: #3792
Saviq added a commit that referenced this issue Mar 6, 2025
More complex than #3784, but
doesn't have unwanted side-effect

Fixes:  #3792
Saviq pushed a commit that referenced this issue Mar 6, 2025
Because we're on an `EGLContextExecutor` we're probably on a different
thread to where the GL state is going to be used, *and* we don't
have implicit flushing with `eglMakeCurrent` happening (because
the context just stays current on the `EGLContextExecutor`.

If the GL implementation has per-thread execution queues (for example,
amdgpu by default), this might mean that the texture setup commands
aren't visible to command stream that's actually using the
texture.

Explicitly `glFlush()` after our texture setup, to ensure
these commands are visible to any `EGLContext` that might
need them.

Closes: #3792
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants