Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting up self-hosted Windows runner #4155

Closed
4 tasks done
wdanilo opened this issue Feb 5, 2023 · 0 comments
Closed
4 tasks done

Setting up self-hosted Windows runner #4155

wdanilo opened this issue Feb 5, 2023 · 0 comments
Assignees
Labels
-ci p-low Low priority x-chore Type: chore

Comments

@wdanilo
Copy link
Member

wdanilo commented Feb 5, 2023

This task is automatically imported from the old Task Issue Board and it was originally created by Wojciech Daniło.
Original issue is here.


Description

The GitHub-hosted macOS runners are very slow. Because of that we had to disable some tests on this platform.

Resources

Hetzner offers dedicated servers with Windows license.

Tasks:

  • Prepare docker image that can manually build the runner
  • manually setup CI job and test it
  • semi-automatic runners setup
  • metarunner runners deployment

Blockers:

#180621134 resolved
#180570504 resolved
#180662049 resolved

Comments:

Michał Urbańczyk reports a new STANDUP for today (2021-11-29):

Today: Windows self-hosted CI: Started work on semi-automatic runner deployment. Encountered unpleasant surprises, including docker-compose issues with mounting volumes and runner registration not being transferrable between host and the container. This means that code and logic from Linux deployment script cannot be simply reused.

New build script: Wanted to catch and debug macOS issues but today the issue didn't want to appear. Instead, problems with setting up runner itself appeared.

macOS self-hosted CI: Encountered issues with setting up CI runner as a service. Apparently every single platform has to be handled in its own magic way. It should be finished by 2021-12-03.

Next Day: Next day I will be working on the #same task. Windows self-hosted CI: Try setting up programmatically runner registration using the container image while preparing volume for reusable containers. (Enso Bot - Nov 30, 2021)


**Michał Urbańczyk** reports a new **STANDUP** for today (2021-11-30):

Today: Windows self-hosted CI: Working on automated CI runner registration using the runner container context. It is like half-way done. It should be finished by 2021-12-03.

Next Day: Next day I will be working on the #same task. Windows self-hosted CI: Finish what was started today. (Enso Bot - Dec 1, 2021)


**Michał Urbańczyk** reports a new **STANDUP** for today (2021-12-01):

Today: Many calls — discussing CI epic planning with Michael, reviewing Łukasz's PR, discussing controllers refactoring with Adam and drinking a virtual coffee with Edward.

Windows self-hosted CI: I got in-container runner registration working. It should be finished by 2021-12-03.

Next Day: Next day I will be working on the #same task. Prepare a Windows version of metarunner. Figure out how to share that named pipe thingy and volume and avoid issues with docker compose. (Enso Bot - Dec 2, 2021)


**Michał Urbańczyk** reports a new **🔴 DELAY** for today (2021-12-02):

Summary: There is 2 day delay in implementation of the Setting up self-hosted Windows runner (#180302888) task.
It will cause 2 day delay for the delivery of this sprint.

Delay Cause: Windows hates me, there is no justice in the world. docker-compose V2 having issues on Windows, Docker Buildkit not supporting Windows (forcing me to devise a different way of injecting secrets while) ACLs are strange.
Also, had a number of calls / reviews over the week, adding to the delay.

Possible solutions: Suffering is inevitable. (Enso Bot - Dec 3, 2021)


**Michał Urbańczyk** reports a new **STANDUP** for today (2021-12-02):

Today: Figured out how to use docker from docker.

Windows self-hosted CI: I got in-container runner registration working. Work in progress to get metarunner work on Windows. It should be finished by 2021-12-05.

Next Day: Next day I will be working on the #same task. Finish the metarunner porting. (Enso Bot - Dec 3, 2021)


**Michał Urbańczyk** reports a new **STANDUP** for the last Friday (2021-12-03):

Today: Things are even worse. Runner registration cannot be reused outside a given container context, making it impossible to share it using volumes or even baking into image. Unfortunately, this makes it impossible to refresh containers without re-registering all runners. It was possible until early this year, when a security patch blocked this possibility, see: https://about.signpath.io/blog/2021/03/23/dp-api-encryption-ineffective-in-windows-containers.html

This forces me to adopt much more bruteish design. This might increase delay but it is too early to tell at this point. It should be finished by 2021-12-05.

Next Day: Next day I will be working on the #same task. Actually finish metarunner using the stupid approach. (Enso Bot - Dec 6, 2021)


**Michał Urbańczyk** reports a new **🔴 DELAY** for today (2021-12-06):

Summary: There is 4 day delay in implementation of the Setting up self-hosted Windows runner (#180302888) task.
It will cause 2 day delay for the delivery of this sprint.

Delay Cause: Mostly troubles with finding a somewhat secure and somewhat sane way of providing secrets to runner containers on Windows. See standup for more information. Unfortonutaly, a totally new approach had to be designed, not allowing for nice reuse of Linux setup. Now while I have the new design mostly in place, still the whole thing needs to be somehow unified with Linux-runners, so we can automatically deploy runners across both systems.

2 days are faux to offset for the weekend.

Possible solutions: None, unless we'd drop desktop IDE packages support. (Enso Bot - Dec 7, 2021)


**Michał Urbańczyk** reports a new **STANDUP** for today (2021-12-06):

Today: Finally found acceptable way of setting up runners and dealing with secrets necessary for runners registration. Made first semi-automatic deployments of runners. Set up a workflow that tries to run engine CI. Still needs more debugging and fixing. It should be finished by 2021-12-09.

Next Day: Next day I will be working on the #same task. Start investigating whether build works. Reconcile windows CI branch with the Linux one, so we can deploy both kinds of runners. (Enso Bot - Dec 7, 2021)


**Michał Urbańczyk** reports a new **STANDUP** for yesterday (2021-12-07):

Today: Done some investigation into the issue with testing Engine on the new runner.
Suffered from the MAX_PATH. (Yes, they kind of fixed it, but the fix is opt-in.)
Started refactoring of the script for the deploying runners. It should be finished by 2021-12-09.

Next Day: Next day I will be working on the #same task. Debug the issues, while refactoring the script. (Enso Bot - Dec 8, 2021)


**Michał Urbańczyk** reports a new **🔴 DELAY** for today (2021-12-09):

Summary: There is 4 day delay in implementation of the Setting up self-hosted Windows runner (#180302888) task.
It will cause 4 day delay for the delivery of this sprint.

Delay Cause: Need to rewrite a portion of Linux runner deployment logic to align with the approach forced by Windows-limited docker.
Also, the build failures for Engine are not yet fixed. Depending on conditions, this may increase the delay further.

2 days are faux to offset for the weekend.

Possible solutions: None, as this is the right way. (Enso Bot - Dec 9, 2021)


**Michał Urbańczyk** reports a new **STANDUP** for yesterday (2021-12-08):

Today: Big progress on refactoring the runner deployment and unifying Linux with Windows. This time Linux hit me a few times with its idiosyncrasies.
The script can now successfully deploy runners on both platforms. Metarunner has been updates and is builds and seemingly works (at least on Windows) but has not been properly tested nor deployed. It should be finished by 2021-12-13.

Next Day: Next day I will be working on the #same task. Focus on metarunner - try to reinstate them. Try to redeploy previous GUI self-hosted runners using the new infrastracture. (Enso Bot - Dec 9, 2021)


**Michał Urbańczyk** reports a new **🔴 DELAY** for the last Friday (2021-12-10):

Summary: There is 2 day delay in implementation of the Setting up self-hosted Windows runner (#180302888) task.
It will cause 2 day delay for the delivery of this sprint.

Delay Cause: The issue with Enso standard lirbary tests failing when run in Windows Container is more troublesome than expected.
Our tests want to read PNG file, this brings in OpenCV dependency. OpenCV requires Medfia Foundation Platform (MF) which is a system component that is not installed in our container image. Also, it is impossible to add it as an optional feature. We can't use other images that provide the library, because there is none for Windows Server 2019 and we cannot use different version if want to rely on process isolation mode for CI runners.

Missing DLLs cannot be naively uploaded from non-Docker image of Windows Server 2019.

Possible solutions: I will try to provide faux DLLs that should make dynamic linker happy. If this proves to be troublesome, I'll just disable these tests and defer the issue. (Enso Bot - Dec 13, 2021)


**Michał Urbańczyk** reports a new **STANDUP** for the last Friday (2021-12-10):

Today: Trying to find a workaround to issue with failing standard library tests on self-hosted Windows CI runner. See delay report for details. It should be finished by 2021-12-15.

Next Day: Next day I will be working on the #same task. Apply a workaround that allows us to carry on with Windows CI. (Enso Bot - Dec 13, 2021)


**Michał Urbańczyk** reports a new **STANDUP** for today (2021-12-13):

Today: Actually fixed the MF issue by using a different image. It just required some tag twiddling.
Continued to debug issues with Windows build. Provide missing 7z on windows runner, fixing issues with network on Linux runner. It should be finished by 2021-12-15.

Next Day: Next day I will be working on the #same task. Solve any matters requiring attention with Windows CI. Otherwise, attempt to deploy metarunners proper. (Enso Bot - Dec 13, 2021)


**Michał Urbańczyk** reports a new **🔴 DELAY** for today (2021-12-14):

Summary: There is 1 day delay in implementation of the Setting up self-hosted Windows runner (#180302888) task.
It will cause 1 day delay for the delivery of this sprint.

Delay Cause: Linux metarunner needs more reworking than expected due to changing how things worked. Also, documentation needs to be rewritten. Also, there's still an outstanding Engine Test failure.

Possible solutions: None that I know of. (Enso Bot - Dec 15, 2021)


**Michał Urbańczyk** reports a new **STANDUP** for today (2021-12-14):

Today: Cleanups in runner images, fixing Linux images to follow the same convention as Windows ones. Debugging failures, working on the new bootstrap scripts. It should be finished by 2021-12-16.

Next Day: Next day I will be working on the #same task. Finish work on metarunners. Look into the Engine tests failure causes. (Enso Bot - Dec 15, 2021)


**Michał Urbańczyk** reports a new **🔴 DELAY** for today (2021-12-15):

Summary: There is 1 day delay in implementation of the Setting up self-hosted Windows runner (#180302888) task.
It will cause 1 day delay for the delivery of this sprint.

Delay Cause: Insane regression that appeared after updating Rust toolchain in the build script. Calls to cmd scripts started failing, if the command contains a space character. This particularly hits usage of sbt and our own Engine's enso, as they are actually not binaries but cmd scripts.
Debugging this and looking for workaround took most of the last day. Still need to reduce and report.

Possible solutions: None that I know of. (Enso Bot - Dec 16, 2021)


**Michał Urbańczyk** reports a new **STANDUP** for today (2021-12-15):

Today: Fixing and deploying metarunner. Looks good.
Spent tons of time trying to debug the issue with invoking cmd scripts (see the delay). It should be finished by 2021-12-17.

Next Day: Next day I will be working on the #same task. Debug the Windows runner failure with the Engine test cleanup.
Further debug and report (if needed) the cmd script calling regression. (Enso Bot - Dec 16, 2021)


**Michał Urbańczyk** reports a new **STANDUP** for yesterday (2021-12-16):

Today: Reducing and reporting the cmd issue to Rust: rust-lang/rust#91991
Debugging the tmp files removal issue. It should be finished by 2021-12-17.

Next Day: Next day I will be working on the #same task. Deploy runners, prepare PR with remaining changes to the Enso code. (Enso Bot - Dec 17, 2021)


**Michał Urbańczyk** reports a new **STANDUP** for the last Friday (2021-12-17):

Today: Deployed runners, did some minor cleanups, prepared the PR that completed the task. It should be finished by 2021-12-17.

Next Day: Next day I will be working on the ##180302836 task. Get back to finishing the build script rewrite — test on macOS M1 environment, fix issues. (Enso Bot - Dec 20, 2021)


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-ci p-low Low priority x-chore Type: chore
Projects
None yet
Development

No branches or pull requests

2 participants