Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rfc for k8s multi cluster deployment #5069

Merged
merged 6 commits into from
Aug 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
337 changes: 337 additions & 0 deletions docs/rfcs/0014-multi-cluster-deployment-for-k8s.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,337 @@
- Start Date: 2024-07-25
- Target Version: 0.49.0

# Summary

This RFC proposes a new feature for k8s app to deploy resources into multi-cluster.

# Motivation

# Usecase
- case 1. When applying the same manifest to multiple clusters for redundant configuration
- case 2. When applying manifest with some patches applied to multiple clusters for redundant configuration
- case 3. Blue/Green Deployment across clusters

# Detailed design

## Overview

We propose the feature to apply the manifests to multiple-clusters in one application.

![image](assets/0014-pipeline-image.png)

## How it works

### Register Application with multiple platform providers

On the application register part, we can choose multiple platform providers.
At first, we add the first platform provider.
If we want to use the feature for deploying multi-cluster, we can set more platform providers. This is optional.
Only the platform providers specified here can be configured for multi-target.

![image](assets/0014-choose-multiple-providers.png)

Also, we can check the list of platform providers on the piped list page to verify the platform providers.

![image](assets/0014-piped-list.png)

### QickSync

Piped asynchronously applies the resources to each environment based on the platform provider and resourceDir specified by the user.

For example, consider deploying a microservice called `microservice-a` to the clusters called `cluster-hoge`, `cluster-fuga`.
At first, we will prepare one application with one `app.pipecd.yaml` and some manifests like this.
Set the item `multiTarget` in spec.quickSync of app.pipecd.yaml, and set the dir containing the manifests you want to deploy and the platform provider to which you want to deploy.
This will deploy to `cluster-hoge` and `cluster-fuga` at the same time when quickSync is executed.

```
microservice-a
└── prd
├── app.pipecd.yaml
├── base
│   ├── deployment.yaml
│   ├── kustomization.yaml
│   └── service.yaml
├── cluster-hoge
│   └── kustomization.yaml
├── cluster-fuga
│   └── kustomization.yaml
└── kustomization.yaml
```

```app.pipecd.yaml
apiVersion: pipecd.dev/v1beta1
kind: KubernetesApp
spec:
name: multi-cluster-app
labels:
env: prd
quickSync:
multiTarget:
- provider:
name: cluster-hoge # platform provider name
resourceDir: ./cluster-hoge # the resource dir
- provider:
name: cluster-fuga
resourceDir: ./cluster-fuga
```

**Rollback**

Similarly, when rolling back, multiple environments are rolled back at the same time based on the information specified in `multiTarget`.
If at least one of the rollback processes succeeds, we consider the rollback successful.
This ensures that the rollback is executed for other environments even if one of the deployment environments is inaccessible.

```
apiVersion: pipecd.dev/v1beta1
kind: KubernetesApp
spec:
name: multi-cluster-app
labels:
env: prd
quickSync:
multiTarget:
- provider:
name: cluster-hoge # platform provider name
resourceDir: ./cluster-hoge # the resource dir
- provider:
name: cluster-fuga
resourceDir: ./cluster-fuga
```


### PipelineSync


Piped asynchronously applies to each environment based on the platform provider and resourceDir specified by the user for each stage.

For example, consider deploying a microservice called `microservice-a` to the clusters called `cluster-hoge`, `cluster-fuga`.
At first, we will prepare one application with one `app.pipecd.yaml` and some manifests like this.
Set the item `multiTarget` in spec.quickSync of app.pipecd.yaml, and set the dir containing the manifests you want to deploy and the platform provider to which you want to deploy.
Also, set the item `multiTarget` in each stage config.
This allows applications to be applied to multiple environments at the same time when one stage is executed.

```
microservice-a
└── prd
├── app.pipecd.yaml
├── base
│   ├── deployment.yaml
│   ├── kustomization.yaml
│   └── service.yaml
├── cluster-hoge
│   └── kustomization.yaml
├── cluster-fuga
│   └── kustomization.yaml
└── kustomization.yaml
```

```
apiVersion: pipecd.dev/v1beta1
kind: KubernetesApp
spec:
name: multi-cluster-app
labels:
env: example
team: product
quickSync:
prune: true
multiTarget:
- provider:
name: cluster-hoge
resourceDir: ./cluster-hoge
- provider:
name: cluster-fuga
resourceDir: ./cluster-fuga
pipeline:
stages:
- name: K8S_CANARY_ROLLOUT
with:
replicas: 10%
multiTarget:
- provider:
name: cluster-hoge
resourceDir: ./cluster-hoge
- provider:
name: cluster-fuga
resourceDir: ./cluster-fuga
...
```

**Rollback**

When rolling back, multiple environments are rolled back at the same time based on the information specified in `spec.quickSync.multiTarget`.
If at least one of the rollback processes succeeds, we consider the rollback successful.
This ensures that the rollback is executed for other environments even if one of the deployment environments is inaccessible.


#### Stages to be supported

We introduce the feature into the stages where changes are made to resources on the cluster.

- K8S_PRIMARY_ROLLOUT
- K8S_CANARY_ROLLOUT
- K8S_CANARY_CLEAN
- K8S_BASELINE_ROLLOUT
- K8S_BASELINE_CLEAN
- K8S_TRAFFIC_ROUTING

### How to check the stage progress of each platform provider in the deployment

Users can check stage logs for each platform provider.
In the future, we will consider visualizing the deployment environment status for each platform provider.

![image](assets/0014-stage-log.png)


### Livestate View & Drift Detection


Currently, a livestate store exists for each platform provider.
Both Livestate View and drift detection use the values ​​obtained from the livestate store based on the appID.
Also, application : platform provider = 1:1 relationship is assumed.

So we propose the improvement to obtain the all state from each platform provider using appID, like aggregation.
This achieves a relationship of application : platform provider = 1 : N.

**Livestate View**

Show livestate of all platform providers deployed by app.

**Drift Detection**

Performs Drift Detection based on the livestate of all platform providers deployed by the app.

### [option] Improve kubeconfig setup on piped

Currently, we need to prepare the kubeconfig file manually.
But it would be nice to prepare it automatically.

It might realize it by using cloud vender feature, for example using Workload Identity on GKE, or IRSA on EKS.
It means piped get kubeconfig when it starts by using them.

# Alternatives

## Idea: Execute Stages in parallel within a pipeline

![image](assets/0014-pipeline-paralell-stage.png)

### UX

- When registering an application
- Prepare manifests for each clusters and one app.pipecd.yaml & register on UI.
- Dir structure

```
- /prd
- app.pipecd.yaml
- /base
- /cluster-hoge
- /cluster-fuga
```

- When deploying
- Sync all clusters corresponding to prd.

- When rolling back
- Roll back in the all previous state.

### Pros & Cons

**Pros**

- Only one app setting is required.
- You can operate WaitApproval for all clusters in one place.
- Flexisible stage pipeline.

**Cons**

- By realizing “parallel execution of stages”, the scheduler mechanism becomes complicated.

# Idea: Deploy to multiple Platform Providers internally

![image](assets/0014-pipeline-already-implemented.png)

This is already implemented as PoC↓
- https://github.com/pipe-cd/pipecd/pull/3790
- https://github.com/pipe-cd/pipecd/pull/3854

## UX

- When registering an application
- Prepare manifests for each clusters and one app.pipecd.yaml & register on UI.
- Dir structure

```
- /prd
- app.pipecd.yaml
- /base
- /cluster-hoge
- /cluster-fuga
```

- When deploying
- Sync all clusters corresponding to prd.

- When rolling back
- Roll back in the all previous state.

### Pros & Cons

**Pros**

- Only one app setting is required.
- You can operate WaitApproval for all clusters in one place.

**Cons**

- Cannot support cases where you want to change the number of replicas for only some clusters.

# Idea: Create a stage to sync apps

![image](assets/0014-pipeline-sync-app-stage-01.png)

![image](assets/0014-pipeline-sync-app-stage-02.png)

### UX

- When registering an application
- Prepare one app.pipecd.yaml as a root application with sync app stage.
- Prepare manifests and app.pipecd.yaml for each clusters and & register on UI.
- Dir structure

```
- /prd
- app.pipecd.yaml
- /base
- /cluster-hoge
- app.pipecd.yaml
- /cluster-fuga
- app.pipecd.yaml
```

- When deploying
- Sync all clusters corresponding to prd when triggering the root app.
- If you want to sync clusters partially, sync them as the each application.

- When rolling back
- Roll back in the all previous state.
- You can select the following behavior by setting the stage.
- Rollback if any app fails
- Rollback if all apps fail
- If the deployments of the applications triggered by the sync app stage are successful, start rollback to the previous commit.
- If the deployments of the applications triggered by the sync app stage are in progress, cancel it.

### Pros & Cons

**Pros**

- It is possible to sync the whole or partially.
- Deployment pipelines can be configured for each environment.

**Cons**

- It takes time to set the App config.
- Need a mechanism to trigger application rollback.
- You need to OK Wait Approval for each App.
- Deployment Chain already exists as a similar function.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/rfcs/assets/0014-piped-list.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/rfcs/assets/0014-pipeline-image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/rfcs/assets/0014-stage-log.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading