Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] Use AZDO built-in parallelization strategy. #7804

Merged
merged 5 commits into from
Feb 27, 2023
Merged

Conversation

jpobst
Copy link
Contributor

@jpobst jpobst commented Feb 16, 2023

Previously, we split our MSBuildIntegration unit tests to run across multiple CI test agents. While a huge improvement, there are a few downsides to the approach we went with at the time:

  • The number of agents is hardcoded by copy/pasting steps in the CI YAML.
  • The tests must be manually load-balanced across agents with [Category ("Node-X")] in code which is cumbersome and hard to keep updated.

As we are at the point where our tests have expanded well past the 1 hour target we are faced with needing to increase our parallelization.

image

This time we're going to invest in a better fix for the above issues.

First, we can remove the YAML duplication by using AZDO's built-in parallel strategy, allowing us to control the number of agents to use with a single number:

  - job: 
    strategy:
      parallel: 4

This leaves us the issue of automatically splitting our tests among an unknown number of test agents. A clever solution is provided in the AZDO docs:

  • Use dotnet test --list-tests to find all the tests
  • Use a script to calculate which tests the agent should run, based on $(System.JobPositionInPhase) and $(System.TotalJobsInPhase)
  • Pass those test names into dotnet test as a filter

Unfortunately there are issues with the provided approach:

  • dotnet test --list-tests is not compatible with --filter so you have to run all tests in the test assembly which is not desirable for us. (Option --list-tests should take into account --filter option sdk#8643)
  • Passing test names (including test parameters) on the command line hits limitations with escaping certain characters and argument limits.

So the approach is good, but we have to do all the work ourselves. Enter dotnet-test-slicer.

This dotnet global tool:

  • Uses the NUnit.Engine NuGet package to find all tests in a test assembly using the desired filter query.
  • Slices the test list for the current test agent.
  • Outputs the desired tests into an NUnit-specific .runsettings file that can be passed to dotnet test --settings foo.runsettings, bypassing command line arguments.

Voila! Now we can automatically scale our test agents up or down as needed by changing a single variable in our YAML file.

image

Limitation: The tests are sliced in a round robin fashion. This can produce uneven results if test durations are uneven. A future optimization would be to store approximate test durations so tests can be load balanced more intelligently.

@jpobst jpobst force-pushed the azdo-parallel-tests branch 9 times, most recently from 99d3866 to 55f4f8e Compare February 22, 2023 06:32
@jpobst jpobst marked this pull request as ready for review February 22, 2023 21:44
Copy link
Member

@jonathanpeppers jonathanpeppers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was comparing two PR builds:

This PR ran 182,624 tests and the other one has 182,627 tests.

Is it worth checking if we somehow lost 3 tests? How can we tell?

@pjcollins
Copy link
Member

Is it worth checking if we somehow lost 3 tests? How can we tell?

The 3 test difference appears to be in the emulator tests, though I haven't looked through all of them yet:

image
vs
image

The additional jobs ultimately use more machine time as well, since we still have a somewhat expensive "provisioning" process for each test job that could use some optimization. Not having to tag every test/fixture with a node is definitely a plus though.

@jpobst
Copy link
Contributor Author

jpobst commented Feb 23, 2023

One of those tests (NativeAssemblyCacheWithSatelliteAssemblies) is added in that PR.

The other two are the permutations of XASdkDeployTests.DotNetDebug ([Values("net6.0-android", "net7.0-android")] string targetFramework), which was set to both "Node-3" and "Node-4", so it is currently being run on both in main:

https://github.com/xamarin/xamarin-android/pull/7804/files#diff-1c999605eb5b10629c064935812b504e78b6b0a8e6530a7beb0f8840a77f0edeL159-L160

The additional jobs ultimately use more machine time as well, since we still have a somewhat expensive "provisioning" process for each test job that could use some optimization.

Yep, this is something I am planning on looking at next.

Copy link
Member

@jonathanpeppers jonathanpeppers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we could try this in main for a while and see how it goes? Anyone else have concerns?

@jpobst jpobst merged commit d447bff into main Feb 27, 2023
@jpobst jpobst deleted the azdo-parallel-tests branch February 27, 2023 23:46
grendello added a commit to grendello/xamarin-android that referenced this pull request Feb 28, 2023
* main:
  [ci] Use AZDO built-in parallelization strategy. (dotnet#7804)
  Bump to dotnet/installer@e3ab0b5 8.0.100-preview.2.23123.10 (dotnet#7813)
  [ci] Run nunit tests with stable .NET version (dotnet#7826)
  [monodroid] Update p/invoke symbol names from net8 alpha1 (dotnet#7829)
  Localized file check-in by OneLocBuild (dotnet#7827)
grendello added a commit to grendello/xamarin-android that referenced this pull request Feb 28, 2023
* main:
  [ci] Use AZDO built-in parallelization strategy. (dotnet#7804)
  Bump to dotnet/installer@e3ab0b5 8.0.100-preview.2.23123.10 (dotnet#7813)
  [ci] Run nunit tests with stable .NET version (dotnet#7826)
  [monodroid] Update p/invoke symbol names from net8 alpha1 (dotnet#7829)
  Localized file check-in by OneLocBuild (dotnet#7827)
grendello added a commit to grendello/xamarin-android that referenced this pull request Feb 28, 2023
* main:
  [ci] Use AZDO built-in parallelization strategy. (dotnet#7804)
  Bump to dotnet/installer@e3ab0b5 8.0.100-preview.2.23123.10 (dotnet#7813)
  [ci] Run nunit tests with stable .NET version (dotnet#7826)
  [monodroid] Update p/invoke symbol names from net8 alpha1 (dotnet#7829)
  Localized file check-in by OneLocBuild (dotnet#7827)
grendello added a commit to grendello/xamarin-android that referenced this pull request Feb 28, 2023
* main:
  [ci] Use AZDO built-in parallelization strategy. (dotnet#7804)
  Bump to dotnet/installer@e3ab0b5 8.0.100-preview.2.23123.10 (dotnet#7813)
  [ci] Run nunit tests with stable .NET version (dotnet#7826)
  [monodroid] Update p/invoke symbol names from net8 alpha1 (dotnet#7829)
grendello added a commit to grendello/xamarin-android that referenced this pull request Mar 6, 2023
* main: (22 commits)
  Bump to dotnet/installer@632ddca 8.0.100-preview.3.23128.1 (dotnet#7836)
  LEGO: Merge pull request 7852
  [ci] Reduce overhead for MSBuildIntegration unit test jobs. (dotnet#7832)
  [ci] Allow dynamic`$(NuGetArtifactName)` values (dotnet#7848)
  [Xamarin.Android.Build.Tasks] guard `AutoImport.props` against empty values (dotnet#7837)
  [Mono.Android] Print type & member remapping info (dotnet#7844)
  [Mono.Android] Tweak AndroidMessageHandler behavior for WCF support (dotnet#7785)
  LEGO: Merge pull request 7845
  Localized file check-in by OneLocBuild Task (dotnet#7842)
  [ci] Use compliance stage template (dotnet#7818)
  [build] pass `--skip-sign-check` to `dotnet workload` (dotnet#7840)
  Replace K4os.Hash.xxHash with System.IO.Hashing (dotnet#7831)
  $(AndroidPackVersionSuffix)=preview.3; net8 is 34.0.0-preview.3 (dotnet#7839)
  [Xamarin.Android.Build.Tasks] Remove support for mkbundle (dotnet#7772)
  [Xamarin.Android.Build.Tasks] `unable to open file as zip archive`? (dotnet#7759)
  [monodroid] Properly process satellite assemblies (dotnet#7823)
  Bump to xamarin/java.interop/main@77800dda (dotnet#7824)
  [ci] Use AZDO built-in parallelization strategy. (dotnet#7804)
  Bump to dotnet/installer@e3ab0b5 8.0.100-preview.2.23123.10 (dotnet#7813)
  [ci] Run nunit tests with stable .NET version (dotnet#7826)
  ...
jpobst added a commit that referenced this pull request Mar 10, 2023
Bring the AzDO parallelization from #7804 and environment setup improvements from #7832 to `Xamarin.Android.Build.Tests.csproj` based test suites.  This includes both the main `MSBuild` stage and the `Smoke Tests` jobs.

Increases parallelization of all jobs as many were approaching ~90 minutes.

As there is no longer a place in the `MSBuild` stage to run `Xamarin.Android.Tools.Aidl-Tests` tests, it was moved to the `macOS > Tests > APKs .NET` stage.  This suite should be fine to only run on Mac and not Windows.  (Note this test assembly was also updated to .NET 7. This required moving it from `Xamarin.Android-Tests.sln` which is currently built with Mono which cannot build .NET 7+ projects. It now is built via `Xamarin.Android.sln`.)
jpobst added a commit that referenced this pull request Mar 14, 2023
Context: #7850 (comment)

With the re-parallelization of our test suites (#7804, #7850), combined with the fact that most PR's end up running the full test suite anyways, the separate "Smoke Tests" jobs aren't really saving us much, if any, CI time.  We feel we would be better off using those additional agents to run the full test suite faster.

This PR removes the separate "Smoke Tests" jobs and runs all tests in the same jobs.

Additionally, as the `[Category ("SmokeTests")]` NUnit attribute is now only used by the Windows build stage to run in-tree tests, it has been cut back to the absolute bare minimum, saving ~45 minutes.  Note that all the tests will continue to be run against the Windows installer, this only runs fewer tests for the Windows in-tree tests.
@github-actions github-actions bot locked and limited conversation to collaborators Jan 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants