Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split builder from operator #583

Merged
merged 30 commits into from
Apr 5, 2019
Merged

Split builder from operator #583

merged 30 commits into from
Apr 5, 2019

Conversation

astefanutti
Copy link
Member

@astefanutti astefanutti commented Mar 28, 2019

Fixes #125.

Todo:

  • Add a Build CRD controller to manage the build pod lifecycle
  • Polish the reconciliation with the IntegrationContext and Integration CRs lifecycle
  • Serialized builds working queue
  • Restore support for attaching resources to the build
  • Adapt persistent volume mount for Kaniko strategy

Next possible items:

@lburgazzoli
Copy link
Contributor

lburgazzoli commented Mar 29, 2019

About "Support concurrent builds" we need to be careful as this may end up not reusing a context or not having a base image to reuse for incremental build.

@astefanutti astefanutti marked this pull request as ready for review April 2, 2019 13:20
@astefanutti
Copy link
Member Author

I'll continue working on improvements but I think that PR has met the original requirement and grown large enough for it to be reviewed.

@@ -111,6 +111,12 @@ func Publisher(ctx *builder.Context) error {
args = baseArgs
}

// The pod will be scheduled to nodes that are selected by the persistent volume
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it be shared with the kaniko pod if the kubernetes provider allows only for RWO volumes? Affinity was there to make the two pod coexist in the same node so that they can share the volume. Local volumes are interesting...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the scheduling of the pods that use a persistent volume should be managed by the platform (Kubernetes). As we are using a PVC with the default storage class, it is up to the configured provisioner for the cluster to specify node affinity for the provisioned persistent volume. Then pods that use this PV will only be scheduled to nodes that are selected by the node affinity.

So multi-node clusters should be using local volumes instead of hostPath volumes. The provisioned local volumes should have the node affinity defined for the node they've been provisioned on. This generalises for other kind of storage devices.

Single-node clusters like those managed by Minikube or MiniShift can still use hostPath volumes. This is what the provisioner for the default storage class configured does actually.

In any cases, I don't think it is manageable to hard-code / handle the scheduling in the operator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, in a multi-node cluster, configured with a provisioner for the default storage class to use local volumes, the build pod and the Kaniko pod will be scheduled on the same node as the node affinity defined for the local volume provisioned on that node, as they both use the same PVC source hence the same volume.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I tell you what I remember from when I fixed the issue on GKE.
The operator was scheduled in any node and the volume was bound correctly. Whenever a build was started, a kaniko pod was created and it was randomly bound. It got bound only when the scheduler decided to assign the kaniko pod to the same node as the operator.
By adding affinity, I told the cluster where I wanted to have the kaniko pod, and it could always bind the volume.

That said... now I think we're removing the option of sharing data between operator and kaniko, so removing the affinity makes perfectly sense...

😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build pods still need to share the build output with Kaniko. And the Kaniko warmer pod needs to share the cache as well to be effective.

I have to test on GKE, but it seems from your experience that the provisioner for the default storage class uses hostPath.

In the long term, I think the affinity should be coming from the persistent volume and the scheduling managed by the platform, to support other kind of volumes and avoid the operator and the builds to be scheduled on a single node. But in the short term, until we do some more testing, I suggest:

  • I add the affinity constraint back
  • I set the build strategy to pod when the kaniko publish strategy is set

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, right.. we still have 2 pods, even in pod mode.

Don't remember if it uses hostPath or another implementation (think more the latter), but I suggest that we do some testing on GKE before releasing next version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added back the constraint so it should operate as before. I'll investigate multi-nodes build support and the removal of the constraint eventually ASAP. I wonder how Knative build handles the build input / output...

@nicolaferraro
Copy link
Member

Love the pod mode, that is compatible with knative build..
I wonder if we should provide also a always-up build operator mode, to speed up builds also in kubernetes-without-knative-build environments

@lburgazzoli
Copy link
Contributor

Love the pod mode, that is compatible with knative build..
I wonder if we should provide also a always-up build operator mode, to speed up builds also in kubernetes-without-knative-build environments

+1

@astefanutti
Copy link
Member Author

astefanutti commented Apr 4, 2019

Love the pod mode, that is compatible with knative build..
I wonder if we should provide also a always-up build operator mode, to speed up builds also in kubernetes-without-knative-build environments

Agreed. It should be quite straight forward with the existing building blocks. With that new builder service strategy, a deployment would be created and configured to delegate to the routine build strategy.

@lburgazzoli lburgazzoli merged commit d5151bd into apache:master Apr 5, 2019
@astefanutti astefanutti deleted the build branch April 6, 2019 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants