-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Linq.Parallel not finishing in 25 minutes in CI runs #29123
Comments
@wfurt could you please provide a machine or instructions how we can look at this issue? |
CC @janvorli if he has some ideas how we can profile this test to know where the time get spent. I am guessing this could be a arm64 synchronization issue? |
I'll check with @ulisesh . On machine I use it takes ~10 minutes to complete. |
Was this release / checked or debug? |
BTW ARM64 builds are now completely busted. When back in normal, my plan was to inject fake test to dump cpu & process info. |
Now I've noticed the |
ok. I'll work with @ulisesh. It is container and machine I was testing on has only 6 cores. So the behavior could be different. |
Do you have more info about the thread sync issue? is this tracked by some other issue? |
@tarekgh no, it is just a guess. There is no evidence from other issues that we have such a problem. But this test is specific - there is a large scale parallelism on the CI hardware, which lead me to the guess. |
I would suggest first trying to repro it on the 46 core machine and see if it is a hang or if it just takes a long time to complete. |
Here's my psychic debugging guess (which could be wrong): Even if the min thread count matched, if ProcessorCount returned 46, it would still make some queries operate on significant amounts of data, e.g. |
I can check that @stephentoub. Is there some easy way how to dump ThreadPool's parameters? I can make them part of test Output for future use.... |
I'd suggest dumping the results of: |
thanks. I'll do it as part of the test run and I'll share results and timing from 46 core machine. |
Helpers/Sources.cs: public static readonly int OuterLoopCount = 4 * 1024 * Environment.ProcessorCount; |
@wfurt - Is this actually doing the outerloop run, though? Although, like @stephentoub points out, similar stuff shows up in a couple of other places. |
@stephentoub I see your point but I am wondering if we have a scaling problem in general? I expect when having more cores, we can handle more data in parallel (except if there is some other factors like other things running on the machine in same time). I know we cannot judge that before collecting more data as you have suggested. Also, can we try running this test separately while not running any other test and look if this will make any difference? |
I did more testing and the test always finished without failures. Note , that it has instructions for XUnits so it only runs one test at the time. All tests bellow are from life machine with no other significant load. In some cases I did multiple runs and I was getting consistent numbers. At first I wanted to establish run on bare OS vs unlimited Docker. (outerloop)
Docker run is little bit slower but not much. When running on base OS I was also looking CPU utilization with top. dotnet process would take 500-600% cpu e.g. 5-6 core equivalent and the system was 70-80% idle. That generally means that either we cannot properly parallelize the task, we have issues in runtime or we hit other limits - like memory bandwidth. To get more data, I did more tests with Docker and various number or cores assigned to container.
Note that the MinThreads is equal to ProcessorCount. I wrote the stats as another test so the AvailableThreads are not collected during test run. I can probably create coredump during run if anybody wants to take a closer look. I also did quick test where |
This is very useful info @wfurt. Thanks. If I am not mistaken, the test I am seeing And I am seeing the UnaryOperations is used multiple time in the test which means the parallelism in the test is much more the processor count. So, I expect increasing the processor count will just increase the time of the test dramatically and not linearly. @wfurt is it possible to have one more trial to change the line same way you tried |
...uh, what? We probably should change that line to something like: // DefaultSize is 16
yield return new object[] { Label("WithDegreesOfParallelism", (start, count, source) => source(start, count).WithDegreeOfParallelism(Math.Min(DefaultSize / 4, Environment.ProcessorCount))) }; ...because the combinatorial tests use a fixed size of data. Most of the time taken by the PLINQ outerloop tests is just due to the fact that there's so many combinatorial tests, and all tests have to be run sequentially. |
either modification gives still runtime around 50 minutes.
I attached testResults.xml from the run. It seems like longest test was ~ 2s but there are just too many:
|
Thanks again @wfurt I want to clarify |
I can easily change both. I have build tree set up for cross-compilation as well as access to test machine. |
I replaced all instances of ProcessorCount with magic constant 4. With that, Outrloop test run on 46 core systems takes 49 minutes. So it seems like the major slowdown down not come from parallelism in test it self. |
@wfurt - by default, PLINQ attempts to use What happens if you add So, for the unordered sources: yield return Label("ParallelEnumerable.Range", (start, count, ignore) => ParallelEnumerable.Range(start, count)).WithDegreeOfParallelism(4); |
I was able to get some improvements @Clockwork-Muse. |
Is it possible we can get a machine or VM for a couple of days we can use to investigate more? |
@ulisesh can give you access. |
@ulisesh would the machine be isolated and not used for any other purpose? I won't disturb any running service there. |
I don't think there's anything here that's actionable, so I'm going to close it. We already changed some of the PLINQ tests based on this to not run based on machine configuration (those tests refer to this issue in the source, and we can leave those references for context), in situations where it would try to take over gigantic machines with tons of cores and throw lots of work at them. |
We have issues with System.Linq.Parallel timing out sometimes. I bumped timout to 25 minutes and the test still did not finish. Somebody should look into it.
The text was updated successfully, but these errors were encountered: