Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare initial Zuul CI setup #9103

Closed
3 of 4 tasks
webknjaz opened this issue Nov 4, 2020 · 20 comments
Closed
3 of 4 tasks

Prepare initial Zuul CI setup #9103

webknjaz opened this issue Nov 4, 2020 · 20 comments

Comments

@webknjaz
Copy link
Member

webknjaz commented Nov 4, 2020

This is a spin-off of #7279 where folks pre-agreed to explore the possibility of extending pip testing experience with external Zuul CI resources. Let's use this issue to coordinate this effort.

@ssbarnea offered to help with the maintenance of the CI itself.

Action items:

@pradyunsg
Copy link
Member

Well, the app is installed. :)

@ianw
Copy link
Contributor

ianw commented Nov 4, 2020

@pradyunsg thank you. I'll shepherd things through on our side and get back

@pradyunsg
Copy link
Member

:)

As a heads-up, our CI takes ~20 minutes on Linux, on the 2 vCPU machines we get from most commercial CI providers (like GitHub Actions, Travis CI, Azure Pipelines). I'm not sure what the details of these external Zuul instances/checks/resources would be [1] but I do think that if we're going to run tests on an even slightly decent matrix, they'd need to be parallelized to not take, like, 2 hours.

[1]: If someone with visibility could share the details on this, that'd be great!

@ianw
Copy link
Contributor

ianw commented Nov 5, 2020

If someone with visibility could share the details on this, that'd be great!

Our basic host is a dedicated VM with 8GB RAM 8xCPU. This is mentioned at [1]. These are donated by a range of hosting providers (you can in fact see them all at [2])

[1] https://docs.opendev.org/opendev/system-config/latest/contribute-cloud.html#contributing-cloud-test-resources
[2] https://opendev.org/openstack/project-config/src/branch/master/nodepool

@pradyunsg
Copy link
Member

pradyunsg commented Nov 5, 2020

Those are about 4x our existing setups, and I'd expect our tests to scale pretty well with that (I think our tests are I/O bound w/ bursty compute).

That sounds great actually -- it's probably big enough to run the entire suite with RAM disks, I think, which would probably work even better since it might reduce the CI times a fair bit (which is better for everyone, given that it's shared+donated resources).

@mnaser
Copy link

mnaser commented Nov 5, 2020

FWIW, just chiming in, using Zuul as nothing but a third-party signal and continuous integration utility really defeats the purpose of it. IMHO, it makes Zuul quite meaningless when it comes to all of it's really powerful features. It really shines when you let it gate your project, which I think should be what is taken into consideration.

Otherwise, it's just a boring job runner like any other hosted CI.

@pfmoore
Copy link
Member

pfmoore commented Nov 5, 2020

It really shines when you let it gate your project, which I think should be what is taken into consideration.

One problem I have with the zuul docs from my (very brief, out of necessity) skim is that it uses a lot of terminology that I'm not familiar with. Here's an example - what do you mean to "gate the project"?

@ianw
Copy link
Contributor

ianw commented Nov 5, 2020

gate the project

When using Zuul to it's full potential, humans do not merge changes/pull requests. You indicate to Zuul that a change is reviewed and tell it that it is safe to merge, and it applies the change to the current HEAD, runs CI and commits the change only after that has passed. You can no longer merge a broken change because you ran the CI 3 days ago against a now out-of-date HEAD and someone else has merged, say, an API change in between -- that would cause the "gate" CI to fail and the change would be rejected; you would rework it, re-review it and submit it again. That's "gating".

Zuul can certainly operate like this on this project. However, the current app doesn't have write permissions, so it's not configured for it (the Zuul that Ansible uses in various ways, is, however).

I have not beaten it into shape for commit to the docs, but I have written up

https://review.opendev.org/#/c/683085/3/doc/source/discussion/zen.rst

which is a more "conversational" view of what Zuul does and is perhaps of interest to you.

@albinvass
Copy link

One problem I have with the zuul docs from my (very brief, out of necessity) skim is that it uses a lot of terminology that I'm not familiar with. Here's an example - what do you mean to "gate the project"?

There's a nice short video at the frontpage that should make it a bit clearer: https://zuul-ci.org/

:)

@pfmoore
Copy link
Member

pfmoore commented Nov 5, 2020

When using Zuul to it's full potential, humans do not merge changes/pull requests.

Right. I doubt we'd want to do anything like that. All we're looking for (at least, in my opinion) is a CI runner. This whole conversation was triggered because Travis changed their Ts&Cs, and as a result we have ended up on just one runner, Github Actions. Our limitation is CI runtime.

Add to that some comments that people feel we should have a bigger test matrix (which I'm not sure we need, @pradyunsg explained why on the other thread) and we ended up discussing zuul. But for me, it's still just "how can we push through our test suite on CI faster" along with some people (not me) being concerned that we currently rely on just one platform.

Specifically, I don't want to see the pip developers spending our very limited time on re-engineering our CI. As a volunteer project, I don't get to dictate, but my hope is that we focus on improving pip's code, and just have CI that's "good enough" (or maintained on our behalf by others 🙂). So I'm very happy to see others offering CI for pip, but I want it to be low (or zero) effort for the pip devs to adopt.

There's a nice short video at the frontpage

I don't do videos for stuff like that, sorry. I prefer to skim text at my own pace.

@albinvass
Copy link

I don't do videos for stuff like that, sorry. I prefer to skim text at my own pace.

Alright. I usually think things are a bit clearer when I have an image in front of me showing how things would fit together.
Maybe this will suit you better: https://zuul-ci.org/docs/zuul/discussion/concepts.html#zuul-concepts

@ssbarnea
Copy link
Contributor

ssbarnea commented Nov 5, 2020

@pfmoore Shortly gating is not unique to zuul, is the process where the merge is not human made, is made by CI/CD pipeline when the right conditions are met, mainly only after testing again the final form of the code. There are lots of GitHub projects that are using a gated approach where no human has merge rights, they usually use a label to mark that "ready-to-merge" and the bot will take care of rebasing, retesting and doing the merge. It does produce a chain of changes that do gradually go into the final product without requiring a human steward to watch them.

Over the years I seen lots of accidents where a change broke the code because CI run on older version of the code. The old case where A and B changes are perfectly normal in isolation but if you put them both, they break. Github has an option to require updated branch before merging but that proves not to work well with active projects, ones that have many changes being made, especially when combines with long running jobs. It creates a long cascade of rebases.

@pfmoore
Copy link
Member

pfmoore commented Nov 5, 2020

Got it, thanks.

IMO pip's workflow isn't perfect, but it's an exercise in balance between catching as much as we can and not spending too much of our precious developer time on infrastructure. Like pretty much everything else in the world 🙂

@ianw
Copy link
Contributor

ianw commented Nov 6, 2020

The tenant and basic configuration are now merged and live

https://zuul.opendev.org/t/pypa/status

@webknjaz
Copy link
Member Author

webknjaz commented Nov 6, 2020

A separate tenant — yaaay! 🎉

@webknjaz
Copy link
Member Author

webknjaz commented Nov 6, 2020

@pfmoore @pradyunsg re: gating — the current effort is to just add more resources and do small incremental steps so that it's not too overwhelming. That's why I didn't even bring up gating in my messages.

But I'd still like to make a comment on this. Both of you seem to think that gating would introduce maintenance burden, friction, and consume an enormous amount of time. I think that there's two separate things that got mixed up in this point of view.

Gating itself is just letting some automated system to do the merge instead of you. Essentially this means that if a PR gets reviewed and labeled as approved to be merged before it gets all of the CI statuses, you don't have to babysit it in order to catch the moment when the Merge button becomes green. You can spend this time doing something useful instead. So it actually saves time rather than consumes it.
Another example would be a case that is an example of real-world friction that you're not protected from in a normal workflow: you get PR A and PR B, they both have green statuses but when you merge them both into master, it becomes red. What would you do normally? I guess it'd be firefighting and debugging the CI on the red master == it's time-consuming and introduces friction to anybody sending-in PRs because they all are now red because of something unrelated or even worse — they have a mixture of tests failing both because of broken master and their own bugs and have to somehow figure out relevant problems.
Now, if you have gating in place, then in this scenario you'll be notified about a breaking change early, before it gets into master, and everybody sending PRs won't be as frustrated. So, in my mind, it's another example of how gating improves the experience rather than making it worse.

Another thing that got into the mix-up is extending the envs matrix. This is what really could introduce more burden because one may end up needing to debug more platforms than they are knowledgeable about. This is also something that the initial setup will probably avoid and will need to be agreed upon with folks who are actually affected by the burden presented.


That's all I wanted to say for now. And to reiterate: this issue doesn't have a goal to introduce anything gating-related. But in the future, it'd be interesting to explore it and maybe enable for a day or a week for people to try it out and have actual feedback on whether it's annoying for them or helpful.

@webknjaz
Copy link
Member Author

webknjaz commented Nov 6, 2020

@pradyunsg FYI the configuration PR already has some status reports: https://github.com/pypa/pip/pull/9107/checks?check_run_id=1361306493.

@webknjaz
Copy link
Member Author

@pradyunsg @pfmoore: I think it may be interesting for you to watch https://youtu.be/mjUPThomu4Q and maybe https://youtu.be/vb0Iuf-6wHs.

@ichard26
Copy link
Member

Are we still interested in using Zuul as part of CI? If not, let's close this.

@webknjaz
Copy link
Member Author

I think it's safe to assume that nobody's going to drive this effort either way...

@webknjaz webknjaz closed this as not planned Won't fix, can't repro, duplicate, stale Apr 18, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants