-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] isolation + threadlocal + test selection with miri #1476
Comments
The command you gave works for me and we are even testing that on CI. So I think we need some more details here to figure out what is happening.
Miri forces the number of threads to be 1, which you can replicate outside of Miri using But other than that, Miri's implementation of thread-local storage should be correct, so it would be interesting to figure out the underlying problem here.
It does, Ctrl-F for "config flag" in our README. But of course it would be better to figure out what actually happens.
Miri releases program memory when the program frees it. However, when the program does a lot of work on the same memory, a lot of metadata for the alias tracking can accumulate. You could try to see if Also since you mention concurrency, you should have gotten a warning that concurrency support is still experimental. In particular, thread-local storage is not freed currently (#1369). (We also keep some thread metadata around forever after the thread dies, but that's just ~100 bytes per thread so I doubt it will be a problem.)
❤️ I am glad that Miri is helpful for you. :) |
As mentioned it was likely my fault, and as soon as I write reproduction instructions, it works ... I think what happens is when you run a single test it passes, then it immediately compiles and tests a second binary, so the output of the second binary shows 0 tests, and that's what confused me! Really sorry about that.
It works with many threads and threads=1 outside of miri, even under stress, which is why I'm a bit confused and possibly assuming incorrectly what the cause could be.
Sure, but I think the other thing is that in my mind is that the "failure" right now happens in leak-detection, which miri will do for me. So while maybe it's an unknown, this does seem like the correct option to disable my own leak detector when running under miri. That also will remove the need to have thread-locals etc which as you point out, could also be an issue. Another thought that comes to mind is I have been previously testing on macos and now I'm testing on Linux so I wonder if there is some platform specific issue that may be occuring.
This could be related, as mentioned my leak-tracking tracks every single allocated datastructure node id, and that set could grow quite large on some tests, and of course, miri would also check those leak tracking sets for correctness too. So I'm thinking the correct answer as before - disable my leak tracker while under miri as it's double handling the same problem. Anyway, I've use the cfg there to disable miri during this test and indeed I can see a memory leak, so again - very likely a mistake in my program rather than miri :) Thanks again! |
Right, I understand the cause of this issue now. So, during a normal test as I allocate and recreate each tracking set for the allocated nodes, it would only show leaks "from that test case". There were 5 tests in my iter.rs module where I was not assigning a tree root correctly, causing those tests to leak. But in those 5 tests I was not calling my "assert_released" function to ensure everything was cleaned. So on the "next test" the allocation set was replaced and those leaks were "missed". That means subsequent tests also did correctly report they were not leaking (because they weren't!) However, when running under miri, I think this wasn't occuring. So once the leafiter tests had been executed, the allocation sets persisted, which caused subsequent tests to then fail - this is because the tracking set contained the leaked nodes from leafiter tests, but the later tests like from_iter were checking assert_released and were detecting the nodes leaked from previous tests. So I have now fixed the leak in my iter tests, and 100% pass once again (I'll leave it to you to decide if this leak counts as a trophy, as the leak only affected tests, not the core datastructure, but miri was still invaluable in helping to find it and give me trust in my code!). During the process I did notice one "quality of life improvement" though. I noticed with a subset of tests you get out put like this:
Now this is all well and good, we know we have leaks, but:
It would be "interesting" to perhaps track at the least, the top level function or test case that each allocation belongs to, so that you can more easily isolated what test is leaking. Ie Alternately, perhaps a "bisect" function where if you have a leak, the tests are re-run and reduced to the set of cases that do have the leak. Anyway, once again, thanks for your time and patience with my questions, and I'm glad to have solved this. Thanks mate! |
Miri is deterministic unless you access the operating system. We have a
that seems like something that the testing infrastructure could benefit from independently of miri. |
That should never happen. We do not deallocate Why would whatever code "resets" your allocation statistics not run in Miri? Or was that "resetting" relying on being in a new thread with clean TLS? But then outside Miri with just a single thread for the test harness you should have the same problem.
That should only happen if your program actually does something differently -- Miri itself introduces no non-determinism, as @oli-obk said. But if e.g. the exact number or size of allocations depends on things returned by the OS and those differ between runs, then alloc IDs can start to diverge.
Please open a separate issue for that. However, the test harness for Miri is just another program, and the top-level function is "main" of the test harness, so I have no good idea for how to achieve this. Miri doesn't even know it is running a test suite, it just runs the MIR that is generated by |
While there is certainly unsafe in my library, I only use std collections and safe constructs in this tracker. The definition is:
At this point I'm questioning now that it's very likely something I am doing wrong though, rather than a fault in miri, so I will think about this more to see if I can identify a reproducer for you.
I'm using Box, which are the allocations in question. So I suspect you are correct that this involves the OS allocator.
That's true. Thought needed on how to achieve this. But I will open the issue. |
miri injects its own allocator. By "using the OS" ralf means reading files or accessing the file system. Hypothetically even using syscalls, but I think we have no problematic ones there. |
Right, if e.g. the content of a file you are reading or an env var changes, that could lead to different program behavior and thus different allocation IDs. The allocator itself is fully deterministic in Miri. |
I'm not reading any files, using any time sources or changing environment variables. I do have isolation disabled though because in a seperate test it does need clock monotonic. Perhaps this is related. |
Even if you don't change them, disabling isolation causes Miri to read all env vars on startup. That can already introduce differences. |
That probably explains that. Thanks for your time! |
The thing we still haven't explained is why your leak checker ceased to work under Miri -- or rather, why its state wasn't reset the way we'd expect. Would still be interesting if you could minimize that into code that you can share. |
I'm trying to understand that still, and I'll report when I understand what's going wrong :) |
Hi - this is a question about miri, and not a bug. It's quite likely that I'm doing something wrong,
I'm using miri with concread due to the amount of unsafe + potential concurrency issues that can exist in a datastructure. So far I believe that miri is "passing" but I'm seeing a couple of odd behaviours.
First, I can't seem to run miri again "targeted" tests. For example:
This still runs "all tests" rather than the subset I want to target. What's the correct way to run miri against only a subset of my test suite?
I'm also seeing test failures such as:
Without miri, this test passes. The failure is occuring in a memory leak check that runs inside of my tests, but it relies on thread-locals / globals and I'm wondering if disable isolation is potentially causing the threads to stomp on each other somehow. There are no leaks in the full tree or stress tests, so I'm not sure it's a leak in my datastructure, but it's probably my fault in how I'm doing the leak check or using miri here.
A way to fix this could be if there is a cfg(miri) flag (similar to cfg test) then I could skip the custom leak check. Does such a flag exist?
Finally, I see my system memory grow to huge amounts - I exceeded 8GB of ram during one attempt and started to swap. Does miri not release it's data until the end of execution or something similar? Or could it be due to the size of the tests I'm attempting to run (I have some multithreaded stress tests still in here)?
Anyway, thanks for your time, and again, thanks for making such an excellent tool!
The text was updated successfully, but these errors were encountered: