Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zombies ate my 3DS? (kernel panic after multiple 3dsx app exits) #48

Closed
AzureMarker opened this issue Feb 14, 2022 · 10 comments · Fixed by rust3ds/pthread-3ds#14
Closed

Comments

@AzureMarker
Copy link
Member

AzureMarker commented Feb 14, 2022

Fun title aside, some odd things happen if you run a 3dsx program a bunch of times. When exiting the 5th run, I get an ARM exception: prefetch abort (kernel panic). This might be related to the std threads, or it's always been there, but it's pretty weird.

The "Fault status" is "Debug event" and it's an exception on core 1 (sys core, which makes sense since the error says kernel panic).

I noticed in Luma3DS's process list that there's zombie 3dsx_app processes when you open/close a program a few times. It panics when there are 4 zombies and you close the 5th 3dsx_app process.

The program I can definitely reproduce this with is thread-info in #46, but I also get it when running the ctru-rs tests (also under std threads though). I haven't tried testing more programs yet, or without std threads.

@Meziu
Copy link
Member

Meziu commented Feb 14, 2022

Hmm. Are we sure the threads are deallocated properly? Especially those on the system core. We should check for that and see what we get.

@AzureMarker
Copy link
Member Author

AzureMarker commented Feb 16, 2022

I did some testing and I only see this happening when svcGetThreadList is called (all others, like the hello world program, don't create zombies). I found a relevant system flaw documented here about this system call not decrementing a reference count: http://3dbrew.org/wiki/3DS_System_Flaws#ARM11_software.

Note that we're currently using this system call as part of pthread_getschedparam/pthread_setschedparam (and therefore std::os::horizon::thread::current_priority) via get_thread_handle here:
https://github.com/Meziu/pthread-3ds/blob/187d2ae5db7c4571db4dcec3ace3bc2a617361e9/src/lib.rs#L111

We have a few options:

  1. Find a way to decrement the reference count ourselves.
  2. Avoid using this system call (find another way to get the main thread's handle).
  3. Replace the standard pthread functions for priority with a single pthread_getpriority_np function that can just use CUR_THREAD_HANDLE (no searching for the specified thread's handle and having to handle the main thread case).

@AzureMarker
Copy link
Member Author

I opened rust3ds/pthread-3ds#12 to avoid calling svcGetThreadList during thread spawning (when creating the default pthread attributes value).

@Meziu
Copy link
Member

Meziu commented Feb 16, 2022

Is this issue closed? I couldn’t replicate it myself, though it seems like you fixed it(?). Tell me if anything is needed.

@ian-h-chamberlain
Copy link
Member

ian-h-chamberlain commented Feb 16, 2022

I think it's still an issue for the APIs that call svcGetThreadList, which includes e.g. pthread_{get,set}schedparam.

Here's an idea: what if instead of using ctru_sys::Thread as pthread_t, we used a simple struct like this:

// pthread-3ds
struct PThread {
    ctru_thread: ctru_sys::Thread,
    thread_id: u32, // or whatever type this needs to be, I think it's u32
}

// libc
pub type pthread_t = *mut ::c_void; // really a *mut PThread

I think then we could use threadGetHandle where needed, or svcOpenThread as a fallback if ctru_thread.is_null() (main thread case), to get the the handle for using the priority APIs etc.

There would be a little bit of extra memory management and bookkeeping, but not too much different than what we already have.

Let me know what you think, I'm fairly sure this would work and could avoid some of these issues. I will try to get a proof of concept working when I have time as well.


Side note: I've noticed that the result of threadGetHandle seems to be different than the result of calling svcGetThreadId -> svcOpenThread to get that handle... so perhaps there is some discrepancy or internal handle state for which we would want a reference to both anyway.

@AzureMarker
Copy link
Member Author

AzureMarker commented Feb 16, 2022

I was writing a comment but forgot to finish it. I haven't read @ian-h-chamberlain's comment fully yet, but the issue isn't currently fixed because the thread list function is still used by std::os::horizon::thread::current_priority.

@AzureMarker
Copy link
Member Author

AzureMarker commented Feb 17, 2022

@ian-h-chamberlain That sounds like a good solution, do you want to make a PR?

Edit: not sure if we need to change libc. Check what the other platforms do. I think as long as the ABI is the same between long and *void it's fine. But feel free to make the change since it's technically more accurate.

@AzureMarker
Copy link
Member Author

AzureMarker commented Feb 17, 2022

@Meziu @ian-h-chamberlain let me know also if you're having trouble reproducing the issue. It should definitely reproduce if you call svcGetThreadList, even in just a temporary example. But the thread-info example from #46 is one of the ways I reproduced it.

You should be able to open Luma3DS's menu and look at the process list, and at the bottom see some 3dsx_app processes marked as Zombie.

@ian-h-chamberlain
Copy link
Member

ian-h-chamberlain commented Feb 17, 2022

Edit: not sure if we need to change libc. Check what the other platforms do. I think as long as the ABI is the same between long and *void it's fine. But feel free to make the change since it's technically more accurate.

Looks like it varies by platform, some use various int type,s, some use *c_void, so we're probably fine here.

I'll open a PR based on what I have now, although it's unfortunate that any example using svcGetThreadList will still zombie... :(

AzureMarker added a commit that referenced this issue Feb 17, 2022
The thread list syscall has a bug which causes the process to become a
zombie. See #48.
@AzureMarker
Copy link
Member Author

Yeah, I've been scouring the internet looking for anything else that references the issue, but I haven't found anything. Luckily only thread-info uses it explicitly, and I've just pushed a change to remove that usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants