databendlabs · mergify · Aug 22, 2022 · Aug 22, 2022 · Aug 22, 2022
diff --git a/docs/doc/10-deploy/04-deploying-databend-on-kubernetes.md b/docs/doc/10-deploy/04-deploying-databend-on-kubernetes.md
@@ -0,0 +1,160 @@
+---
+title: Start a Query Cluster on Kubernetes
+sidebar_label: K8s Cluster
+description:
+    How to deploy a Databend query cluster on Kubernetes.
+---
+
+:::tip
+
+Expected deployment time: ** 5 minutes ⏱ **
+
+:::
+
+This tutorial covers how to install and configure the Databend query cluster on kubernetes with minio storage backend.
+## Before you begin
+
+* Make sure your cluster have enough resource for installation (at least 4 CPUs, 4GB RAM, 50GB disk)
+* Make sure you have a kubernetes cluster up and running. Please take a look at [k3d](https://k3d.io/v5.3.0/) or [minikube](https://minikube.sigs.k8s.io/docs/start/).
+* Databend Cluster mode only works on shared storage(AWS S3 or MinIO s3-like storage).
+
+## Deploy a sample databend cluster with minio
+
+### Step 1 Install Minio
+:::caution
+This configuration is for demonstration ONLY, never use it in production, please take a look at
+https://docs.min.io/docs/deploy-minio-on-kubernetes.html
+for more information on production TLS and High Availability configurations.
+:::
+
+We will bootstrap a minio server on kubernetes, with the following configurations
+
+```shell title="minio-server-config"
+STORAGE_TYPE=s3
+STORAGE_S3_BUCKET=sample-storage
+STORAGE_S3_REGION=us-east-1
+STORAGE_S3_ENDPOINT_URL=http://minio.minio.svc.cluster.local:9000
+STORAGE_S3_ACCESS_KEY_ID=minio
+STORAGE_S3_SECRET_ACCESS_KEY=minio123
+```
+
+The following configuration shall be applied to the target kubernetes cluster, it would create a bucket named `sample-storage` with `10Gi` storage space
+
+```shell title="minio-server-deployment"
+kubectl create namespace minio --dry-run=client -o yaml | kubectl apply -f -
+kubectl apply -f https://raw.githubusercontent.com/datafuselabs/databend/main/scripts/kubernetes/minio-sample.yaml -n minio
+```
+
+### Step 2. Deploy standalone databend meta-service layer
+
+The following configuration would configure a standalone databend meta-service on `databend-system` namespace
+
+```shell title="databend-meta-service-deployment"
+kubectl create namespace databend-system --dry-run=client -o yaml | kubectl apply -f -
+kubectl apply -f https://raw.githubusercontent.com/datafuselabs/databend/main/scripts/kubernetes/meta-standalone.yaml -n databend-system
+```
+### Step 3. Deploy databend query cluster
+
+The following configuration would configure a databend query cluster on `tenant1` namespace
+Each pod under the deployment have `900m` vCPU with `900Mi` memory
+```shell title="databend-query-service-deployment"
+kubectl create namespace tenant1 --dry-run=client -o yaml | kubectl apply -f -
+kubectl apply -f https://raw.githubusercontent.com/datafuselabs/databend/main/scripts/kubernetes/query-cluster.yaml -n tenant1
+```
+
+To scale up or down the query cluster, please use the following command
+```shell
+ # scale query cluster number to 0
+ kubectl scale -n tenant1 deployment query --replicas=0
+ # scale query cluster number to 3
+ kubectl scale -n tenant1 deployment query --replicas=3
+ ```
+
+### 3.1 Check the Cluster Information
+***
+NOTICE: Please make sure that the localhost port 3308 is available.
+***
+```shell
+nohup kubectl port-forward -n tenant1 svc/query-service 3308:3307 &
+mysql -h127.0.0.1 -uroot -P3308
+```
+
+```sql
+SELECT * FROM system.clusters
+```
+```
++----------------------+------------+------+
+| name                 | host       | port |
++----------------------+------------+------+
+| dIUkzbOaqJEPudb0A7j4 | 172.17.0.6 | 9191 |
+| NzfBm4KIQGEHe0sxAWa3 | 172.17.0.7 | 9191 |
+| w3MuQR8aTHKHC1OLj5a6 | 172.17.0.5 | 9191 |
++----------------------+------------+------+
+```
+
+### Step 4. Distributed query
+
+```text
+EXPLAIN SELECT max(number), sum(number) FROM numbers_mt(10000000000) GROUP BY number % 3, number % 4, number % 5 LIMIT 10;
++-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| explain                                                                                                                                                                                                           |
++-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| Limit: 10                                                                                                                                                                                                         |
+|   RedistributeStage[expr: 0]                                                                                                                                                                                      |
+|     Projection: max(number):UInt64, sum(number):UInt64                                                                                                                                                            |
+|       AggregatorFinal: groupBy=[[(number % 3), (number % 4), (number % 5)]], aggr=[[max(number), sum(number)]]                                                                                                    |
+|         RedistributeStage[expr: sipHash(_group_by_key)]                                                                                                                                                           |
+|           AggregatorPartial: groupBy=[[(number % 3), (number % 4), (number % 5)]], aggr=[[max(number), sum(number)]]                                                                                              |
+|             Expression: (number % 3):UInt8, (number % 4):UInt8, (number % 5):UInt8, number:UInt64 (Before GroupBy)                                                                                                |
+|               ReadDataSource: scan schema: [number:UInt64], statistics: [read_rows: 10000000000, read_bytes: 80000000000, partitions_scanned: 1000001, partitions_total: 1000001], push_downs: [projections: [0]] |
++-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+```
+
+The distributed query works, the cluster will efficiently transfer data through `flight_api_address`.
+
+### Step 4.1. Upload the data to the cluster
+```sql
+CREATE TABLE t1(i INT, j INT);
+```
+
+```sql
+INSERT INTO t1 SELECT number, number + 300 from numbers(10000000);
+```
+```sql
+SELECT count(*) FROM t1;
+```
+```
++----------+
+| count()  |
++----------+
+| 10000000 |
++----------+
+```
+
+
+## Install Databend Cluster with Helm Chart
+
+We support to install Databend cluster with our official helm [chart](https://github.com/datafuselabs/helm-charts)
+
+### Install Meta Service
+Install a standalone databend meta service
+Please follow the [documentation](https://github.com/datafuselabs/helm-charts/blob/main/charts/databend-meta/values.yaml) for further configuration options like high availability
+
+
+```bash
+helm repo add databend https://charts.databend.rs
+helm install my-release databend/databend-meta --namespace databend --create-namespace
+```
+
+### Install Query Service
+
+The following command would regist the databend query service to the meta service with 3 nodes.
+
+Please follow the [documentation](https://github.com/datafuselabs/helm-charts/blob/main/charts/databend-query/values.yaml) for further configuration options like object storage secrets 
+
+```bash
+helm repo add databend https://charts.databend.rs
+helm install query databend/databend-query --namespace databend --create-namespace \
+          --set config.meta.address=my-release-databend-meta.databend.svc.cluster.local:9191 \
+          --set replicaCount=3 
+```