Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc on OS onboarding #112026

Merged
merged 13 commits into from
Feb 10, 2025
53 changes: 53 additions & 0 deletions docs/project/os-onboarding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# OS Onboarding Guide
richlander marked this conversation as resolved.
Show resolved Hide resolved

Adding support for new operating systems (largely just new versions) is a frequent need. This guide describes how we do that, including policies we use.
jkotas marked this conversation as resolved.
Show resolved Hide resolved

References:

- [.NET OS Support Tracking](https://github.com/dotnet/core/issues/9638)
- [Prereq container image lifecycle](https://github.com/dotnet/dotnet-buildtools-prereqs-docker/blob/main/lifecycle.md)

## Context

In most cases, we find that new OSes _may_ uncover problems in dotnet/runtime and once resolved don't affect up-stack components or apps. This is because nearly all the APIs that touch native code (networking, cryptography) and deal with standard formats (time zones, ASN.1) are in dotnet/runtime. In many cases, we only see test breaks.
richlander marked this conversation as resolved.
Show resolved Hide resolved

Our testing philosophy is based on risk and past experience. The effective test matrix is huge, the product of OSes \* supported versions \* architectures. We try to make smart choices to skip testing most of the matrix while retaining much of the practical coverage. We also know where we tend to get bitten most when we don't pay sufficient attention. For example, our bug risk across Linux, macOS, and Windows is not uniform.

## Approach

New OSes should be added/tested first in `main`. If changes are required, we should prove them out first in `main` before committing to shipping them in a servicing release. However, it isn't always necessary to backport test coverage.

There are two reasons (beyond known product breaks) to add a new OS reference to a release branch:

- Add coverage due to practice or known risk
- Update a reference to an EOL OS version

If those reasons don't apply, then we can often skip backporting new coverage.

In the case that a .NET version will be EOL in <6 months, then new coverage can typically be skipped.

## End-of-life

We will often maintain our level of coverage when a new OS comes available by replacing an older one. This ends up being an effective strategy to remediating EOL OSes, ahead of time.
In some cases, we're required to test an OS version until the end of its life and will need to take specific action to remediate the reference.

For whatever the reason, we should update references to EOL OSes if we have them.

## Mechanics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about builds and changing the platform we build on?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builds are not affected by OS onboarding and use a different process. We only build on x64.

We've documented the process we use for Linux: https://github.com/dotnet/runtime/blob/main/docs/project/linux-build-methodology.md. Windows is simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that we should also cover how new platforms apply to build and what we need to do to update the versions of build agents.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends what you mean. In practice, we don't do this. The last one of these that I am aware of is Apple Arm64. That was a massive project (as you recall) and no doc could cover that.

Do you have another example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps @sbomer @jkoritzinsky can think of some examples.


Most of our testing is done in container images. New images need to be created for each new version in the [dotnet/dotnet-buildtools-prereqs-docker](https://github.com/dotnet/dotnet-buildtools-prereqs-docker) repo. The repo is self-service and largely self-explanatory. One typically creates a new image using the pattern demonstrated by the previous version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say most - I think it's a mix between Helix queues and docker images.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Most of our testing is done in container images. New images need to be created for each new version in the [dotnet/dotnet-buildtools-prereqs-docker](https://github.com/dotnet/dotnet-buildtools-prereqs-docker) repo. The repo is self-service and largely self-explanatory. One typically creates a new image using the pattern demonstrated by the previous version.
Much of our testing is done in container images. New images need to be created for each new version in the [dotnet/dotnet-buildtools-prereqs-docker](https://github.com/dotnet/dotnet-buildtools-prereqs-docker) repo. The repo is self-service and largely self-explanatory. One typically creates a new image using the pattern demonstrated by the previous version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about helix queues? The most recent ask for Server 2025 was delayed for so long because we didn't have queues. I think we need to call that out as a prerequisite - unless there was a miss here and testing could have been unblocked with containers all along.

richlander marked this conversation as resolved.
Show resolved Hide resolved

jkotas marked this conversation as resolved.
Show resolved Hide resolved
These images are referenced in our pipeline files:

- https://github.com/dotnet/runtime/blob/main/eng/pipelines/coreclr/templates/helix-queues-setup.yml
- https://github.com/dotnet/runtime/blob/main/eng/pipelines/libraries/helix.yml

Those files are for the `main` branch. The same files should be located in the same location in release branches.

Example PRs:

- <https://github.com/dotnet/runtime/pull/111768>
- <https://github.com/dotnet/runtime/pull/111504>
- <https://github.com/dotnet/runtime/pull/110492>
- <https://github.com/dotnet/dotnet-buildtools-prereqs-docker/pull/1282>
- <https://github.com/dotnet/dotnet-buildtools-prereqs-docker/pull/1314>