Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR to see what works needs to be done to merge matrix-testing into master #8327

Closed
wants to merge 108 commits into from

Conversation

whitneyschmidt
Copy link
Contributor

This is a test PR to continue iterating on pipeline work & resolve any conflicts with master.

mandel-macaque and others added 25 commits February 15, 2020 00:43
After reading the code of the parser used in VSTS I found out that name
is not used, but fullname. That attribute is the one used in their xml
parser to set the 'testCaseResultData.AutomatedTestName'.
…back

Our bots spend most of the time compiling and deploying rather than
executing tests. This means that a bot with an iOS device is most of the
time not really running tests, but deploying them. With the old pipeline
the average runtime to get feedback for all the device tests is around 6
to 7 hours.

In order to improve the speed in which we get results the new pipeline
spawns 5 different testing jobs and a last 6th job that is a fan in to
report the completion to the monitoring person.

The tests have been divided in the following way:

1. Xamarin tests, wich are all those tests that are not bcl tests or
monotouch.
2. Monotouch tests, since it is a large set of tests.
3. All xUnit based BCL tests.
4. All NUnit based BCL tests.
5. mscorlib tests (all 3 splitted assemblies).

In average the jobs individually take the following times:

1. Xamarin tests: 1h 30m
2. Monotouch tests: 45/50m
3. xUnit based BCL: 1h 30m
4. NUnit based BCL: 45 m
5. mscorlib 1h 30m

In a perfect world with infinite bots, we can get device test results in
under 2 hours, but in reality we might be getting around the 3 hours,
which is half of the time atm.

In order to make things easier for the future different templates in yml
have been created under tools/devops/templates

- tools/devops/templates/device-tests.yml
  Basic template that contains all the required steps needed to run the
  tests on device. This template takes two env vars. One which will
  contains the extra labels to pass to xharness and the context to be
  used to set the device tests status.

- tools/devops/templates/job-matrix.yml
  Main template that allows to create pipelines with all the 5 jobs
  targeting different devices. The template takes a numer of parameters
  to set the targetd device and the capabilities of the agents.

- tools/devops/templates/publish-results.yml
  Small helper template for posting messages.

Once the 5 jobs are completed, a 6th job is kicked that does not need an
iOS/tvOS device (we have more agents without capabilities). The jobs
grabs the artifacts from the previous jos, unzipts them and performs the
publication of the test results to vsts. Once completed it reports the
result in a status to github. This job will execute no matter what were
the results of the previous jobs (notice the always() condition).
On devices that cannot reach the host via TCP we do not have a log, this
means that in the if statement needs to have a case for it.

The main problem is that when the device cannot connect to the host, we
do not get a log OR a crash reason from the crash logs. It makes sense
not to have a crash reason, because the app did not crash. In these
sitations, we have to create a xml crash report (since we really do not
know if we can parse the file) that will tell vsts that there was an
issue. Adding the main log will let the monitoring person see the
results of the test run.
Xml should be:
```
<failure>
  <message>Foo</message>
  <stack-trace>Bar</stack-trace>
</failure>
```

But we generate:

```
<failure>
  <message>Foo
    <stack-trace>Bar</stack-trace>
  </message>
</failure>
```

Makes the parsing of the failures impossible.
In order to simplify the monitoring job add the device name to the
following failures:

* Installation
* Launch
* Tcp Connection

All the above are most of the time due to a misconfigured device. The
device name is useful information for the monitoring person to be able
to reach IT and address the issue.
AppName should only be the app name since we are passing the varation as
a parameter too. Else we end up with $"{appname} {variation} {variation}".
The publishing tool is a little fragile. In run
https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=3490479&view=logs&j=67d14776-f827-5fe4-2625-2db4b5987fd1&t=fa262eec-9d97-5ba4-b4cc-a9292beecd8f
I noticed that valid test runs with a failing test (launch issues) were
not being uploaded.

I found out that the reason is a flaw in the logic on the parser of the
publishing tool. The tool assumes, that if there is no start-time, there
are no test results (do remember that NUnitV3 is schemaless we don't
know exactly what attrs are compulsory).

The culprint is line: https://dev.azure.com/mseng/AzureDevOps/_git/AzureDevOps?path=%2FTa%2FTasks%2FPublishTestResults%2FParser%2FNUnitResultParser.cs&version=GBmaster&line=473&lineEnd=473&lineStartColumn=63&lineEndColumn=64&lineStyle=plain

Basically:

```csharp
 if (testRunNode?.Attributes?["start-time"] != null) {
   // import test data
 }

 // do nothing interesting since there is no data
```

This commit fixes it by setting the start time as the current one, we
dont care since it is a failure xml result.
* comment out most of templates/device-tests.yml

* fix yaml formatting

* remove condition from publish path

* try modifying path to TestSummary.md

* add .md

* try out merge pwd

* try out searching one dir up for summary .mds, cat merged summary

* fix dir path

* narrow down .md files grabbed

* try out publishing + downloading merged file

* try out some pwsh

* fix yaml for testing pwsh

* try out messy pwsh

* try pointing at right script location

* try out fuller path for pwsh script

* try out fuller path for pwsh script

* fix hash literal

* fix pwsh bug in

* try out Bearer

* change pwsh to PUT, try and print  to check validity

* try out new evaluation of token

* try print auth

* try print everything

* fix variables?

* remove jobs from matrix

* change some env var stuff

* more env

* try out new target_url

* try GitHub.Token

* add testing for Gh token

* try pwd for ps1

* try changing location to pwsh path

* try out ps1 from jenkins dir

* print out env vars

* fix target_url?

* fix for string concat

* fix GH url?

* fix GH token?

* unquote Authorization

* add user agent?

* Authorization -> AUthentication

* try adding quotes to json

* print out bash json, remove hash from pwsh json payload

* fix underscore url

* change put to post

* try out full restmethod syntax

* manually enter gh token

* remove extra code that changes location

* re-add real token

* try using @params

* tools/devops/templates/publish-results.yml

* add back env vars

* remove some env variables

* env vars are actually needed...

* try adding context?

* uncomment a bunch of stuff in device-tests.yml

* move comment?

* try out dir path

* ls some more

* try out fix for run-tests.sh

* try setting variable for gh_status

* replace vsts device tests with dummy test

* add own sh script

* try setting job vars in script

* try out setting var in a bash task

* try out setting global var directly

* fix failure

* add GH status for failures, set tests to always fail:

* add publish_failure.ps1

* fix dir path for script

* skip iOS device test stage

* re-add stages keyword

* remove broken var stuff

* comment out dependencie on iOS

* try adding some more env vars to pwsh script

* add env vars to json payload for status, run more than one test suite

* switch from tvos to ios to avoid agent issues

* fix broken stage

* fix broken syntax

* update dependencies to ios

* whoops

* print out commit comment json payload

* fix error in publish_failure

* try switching to tvos to see whether queue is shorter

* whoops, try again for tvos

* initial try at adding commit message in publish_failure.ps1

* fix json payload error

* fix formatting and emojis

* fix broken escape characters in json payload

* tweak context + add AGGREGATE to status for final status, try out getting status

* use api.github.com

* fix combined status url

* try setting status by querying GH for combined status

* fix the way that we get json object status

* try out accessing response like pwsh object

* Get-Location in pwsh scripts

* uncomment real test runs, try accessing testsummary.md for aggregate result comment

* revert stuff
@whitneyschmidt whitneyschmidt added do-not-merge Do not merge this pull request skip-all-tests Skip all the tests build-package Build (and create package) on internal Jenkins. Apply 'run-internal-tests' to run tests too. skip-public-jenkins Completely skip execution in the public Jenkins instance labels Apr 8, 2020
@monojenkins
Copy link
Collaborator

Build success
ℹ️ Skipped execution

@xamarin-release-manager
Copy link
Collaborator

Build was (probably) aborted

🔥 Jenkins job (on internal Jenkins) failed in stage(s) 'Running XM tests on '10.10'' 🔥 : hudson.AbortException: Xamarin.Mac tests on macOS 10.10 failed (xammac_tests)

Build succeeded
✅ Packages:

API Diff (from stable)
API Diff (from PR only) (no change)
Generator Diff (no change)
ℹ️ Test run skipped: Not running tests here because they're run on public Jenkins.
🔥 Xamarin.Mac tests on 10.10 failed: Xamarin.Mac tests on macOS 10.10 failed (xammac_tests) 🔥

@xamarin-release-manager
Copy link
Collaborator

Build was (probably) aborted

🔥 Jenkins job (on internal Jenkins) failed in stage(s) 'Running XM tests on '10.10'' 🔥 : hudson.AbortException: Xamarin.Mac tests on macOS 10.10 failed (xammac_tests)

Build succeeded
✅ Packages:

API Diff (from stable)
API Diff (from PR only) (no change)
Generator Diff (no change)
ℹ️ Test run skipped: Not running tests here because they're run on public Jenkins.
🔥 Xamarin.Mac tests on 10.10 failed: Xamarin.Mac tests on macOS 10.10 failed (xammac_tests) 🔥

@monojenkins
Copy link
Collaborator

Build success
ℹ️ Skipped execution

@xamarin-release-manager
Copy link
Collaborator

Build was (probably) aborted

🔥 Jenkins job (on internal Jenkins) failed in stage(s) 'Upload to Azure' 🔥 : org.jenkinsci.plugins.workflow.steps.FlowInterruptedException

Build succeeded

@monojenkins
Copy link
Collaborator

Build success
ℹ️ Skipped execution

@xamarin-release-manager
Copy link
Collaborator

Build was (probably) aborted

🔥 Jenkins job (on internal Jenkins) failed in stage(s) 'Running XM tests on '10.10'' 🔥 : hudson.AbortException: Xamarin.Mac tests on macOS 10.10 failed (xammac_tests)

Build succeeded
✅ Packages:

API Diff (from stable)
API Diff (from PR only) (no change)
Generator Diff (no change)
ℹ️ Test run skipped: Not running tests here because they're run on public Jenkins.
🔥 Xamarin.Mac tests on 10.10 failed: Xamarin.Mac tests on macOS 10.10 failed (xammac_tests) 🔥

@monojenkins
Copy link
Collaborator

Build success
ℹ️ Skipped execution

@monojenkins
Copy link
Collaborator

Build success
ℹ️ Skipped execution

@xamarin-release-manager
Copy link
Collaborator

Build was (probably) aborted

🔥 Jenkins job (on internal Jenkins) failed in stage(s) 'Running XM tests on '10.11', Running XM tests on '10.10', Running XM tests on '10.9'' 🔥 : hudson.AbortException: Xamarin.Mac tests on macOS 10.11 failed (dontlink (system))

Build succeeded
✅ Packages:

API Diff (from stable)
API Diff (from PR only) (no change)
Generator Diff (no change)
ℹ️ Test run skipped: Not running tests here because they're run on public Jenkins.
🔥 Xamarin.Mac tests on 10.11 failed: Xamarin.Mac tests on macOS 10.11 failed (dontlink (system)) 🔥
🔥 Xamarin.Mac tests on 10.10 failed: Xamarin.Mac tests on macOS 10.10 failed (xammac_tests, dontlink (system)) 🔥
🔥 Xamarin.Mac tests on 10.9 failed: Xamarin.Mac tests on macOS 10.9 failed (xammac_tests) 🔥

@xamarin-release-manager
Copy link
Collaborator

Build was (probably) aborted

🔥 Jenkins job (on internal Jenkins) failed in stage(s) 'Packaging' 🔥 : org.jenkinsci.plugins.workflow.steps.FlowInterruptedException

Build succeeded

@monojenkins
Copy link
Collaborator

Build success
ℹ️ Skipped execution

@xamarin-release-manager
Copy link
Collaborator

Build was (probably) aborted

🔥 Jenkins job (on internal Jenkins) failed in stage(s) 'Packaging' 🔥 : org.jenkinsci.plugins.workflow.steps.FlowInterruptedException

Build succeeded

@monojenkins
Copy link
Collaborator

Build success
ℹ️ Skipped execution

@xamarin-release-manager
Copy link
Collaborator

Build was (probably) aborted

🔥 Jenkins job (on internal Jenkins) failed in stage(s) 'Running XM tests on '10.10'' 🔥 : hudson.AbortException: Xamarin.Mac tests on macOS 10.10 failed (xammac_tests)

Build succeeded
✅ Packages:

API Diff (from stable)
API Diff (from PR only) (no change)
Generator Diff (no change)
ℹ️ Test run skipped: Not running tests here because they're run on public Jenkins.
🔥 Xamarin.Mac tests on 10.10 failed: Xamarin.Mac tests on macOS 10.10 failed (xammac_tests) 🔥

@rolfbjarne rolfbjarne changed the base branch from master to main June 12, 2020 09:02
@xamarin-release-manager
Copy link
Collaborator

Build was (probably) aborted

🔥 Jenkins job (on internal Jenkins) failed in stage(s) 'Running XM tests on '10.10', Running XM tests on '10.12'' 🔥 : hudson.AbortException: Xamarin.Mac tests on macOS 10.10 failed (apitest, xammac_tests)

Build succeeded
✅ Packages:

API Diff (from stable)
API Diff (from PR only) (no change)
Generator Diff (no change)
ℹ️ Test run skipped: Not running tests here because they're run on public Jenkins.
🔥 Xamarin.Mac tests on 10.10 failed: Xamarin.Mac tests on macOS 10.10 failed (apitest, xammac_tests) 🔥
🔥 Xamarin.Mac tests on 10.12 failed: Xamarin.Mac tests on macOS 10.12 failed (apitest) 🔥

@monojenkins
Copy link
Collaborator

Build success
ℹ️ Skipped execution

@whitneyschmidt whitneyschmidt added the not-notes-worthy Ignore for release notes label Sep 9, 2020
@mandel-macaque
Copy link
Member

Closing, the yaml pipeline is too diff to merge this changes atm. We will have to start from scratch.

@mandel-macaque mandel-macaque deleted the matrix-testing branch December 7, 2021 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build-package Build (and create package) on internal Jenkins. Apply 'run-internal-tests' to run tests too. do-not-merge Do not merge this pull request not-notes-worthy Ignore for release notes skip-all-tests Skip all the tests skip-public-jenkins Completely skip execution in the public Jenkins instance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants