From aae9f48ad044a1c07862acf560586532a87e2b84 Mon Sep 17 00:00:00 2001 From: zhihanz Date: Mon, 22 Aug 2022 18:19:54 +0800 Subject: [PATCH] add doc about how to deploy databend on kubernetes --- .../04-deploying-databend-on-kubernetes.md | 160 ++++++++++++++++++ 1 file changed, 160 insertions(+) create mode 100644 docs/doc/10-deploy/04-deploying-databend-on-kubernetes.md diff --git a/docs/doc/10-deploy/04-deploying-databend-on-kubernetes.md b/docs/doc/10-deploy/04-deploying-databend-on-kubernetes.md new file mode 100644 index 0000000000000..aeb73ecd490e6 --- /dev/null +++ b/docs/doc/10-deploy/04-deploying-databend-on-kubernetes.md @@ -0,0 +1,160 @@ +--- +title: Start a Query Cluster on Kubernetes +sidebar_label: K8s Cluster +description: + How to deploy a Databend query cluster on Kubernetes. +--- + +:::tip + +Expected deployment time: ** 5 minutes ⏱ ** + +::: + +This tutorial covers how to install and configure the Databend query cluster on kubernetes with minio storage backend. +## Before you begin + +* Make sure your cluster have enough resource for installation (at least 4 CPUs, 4GB RAM, 50GB disk) +* Make sure you have a kubernetes cluster up and running. Please take a look at [k3d](https://k3d.io/v5.3.0/) or [minikube](https://minikube.sigs.k8s.io/docs/start/). +* Databend Cluster mode only works on shared storage(AWS S3 or MinIO s3-like storage). + +## Deploy a sample databend cluster with minio + +### Step 1 Install Minio +:::caution +This configuration is for demonstration ONLY, never use it in production, please take a look at +https://docs.min.io/docs/deploy-minio-on-kubernetes.html +for more information on production TLS and High Availability configurations. +::: + +We will bootstrap a minio server on kubernetes, with the following configurations + +```shell title="minio-server-config" +STORAGE_TYPE=s3 +STORAGE_S3_BUCKET=sample-storage +STORAGE_S3_REGION=us-east-1 +STORAGE_S3_ENDPOINT_URL=http://minio.minio.svc.cluster.local:9000 +STORAGE_S3_ACCESS_KEY_ID=minio +STORAGE_S3_SECRET_ACCESS_KEY=minio123 +``` + +The following configuration shall be applied to the target kubernetes cluster, it would create a bucket named `sample-storage` with `10Gi` storage space + +```shell title="minio-server-deployment" +kubectl create namespace minio --dry-run=client -o yaml | kubectl apply -f - +kubectl apply -f https://raw.githubusercontent.com/datafuselabs/databend/main/scripts/kubernetes/minio-sample.yaml -n minio +``` + +### Step 2. Deploy standalone databend meta-service layer + +The following configuration would configure a standalone databend meta-service on `databend-system` namespace + +```shell title="databend-meta-service-deployment" +kubectl create namespace databend-system --dry-run=client -o yaml | kubectl apply -f - +kubectl apply -f https://raw.githubusercontent.com/datafuselabs/databend/main/scripts/kubernetes/meta-standalone.yaml -n databend-system +``` +### Step 3. Deploy databend query cluster + +The following configuration would configure a databend query cluster on `tenant1` namespace +Each pod under the deployment have `900m` vCPU with `900Mi` memory +```shell title="databend-query-service-deployment" +kubectl create namespace tenant1 --dry-run=client -o yaml | kubectl apply -f - +kubectl apply -f https://raw.githubusercontent.com/datafuselabs/databend/main/scripts/kubernetes/query-cluster.yaml -n tenant1 +``` + +To scale up or down the query cluster, please use the following command +```shell + # scale query cluster number to 0 + kubectl scale -n tenant1 deployment query --replicas=0 + # scale query cluster number to 3 + kubectl scale -n tenant1 deployment query --replicas=3 + ``` + +### 3.1 Check the Cluster Information +*** +NOTICE: Please make sure that the localhost port 3308 is available. +*** +```shell +nohup kubectl port-forward -n tenant1 svc/query-service 3308:3307 & +mysql -h127.0.0.1 -uroot -P3308 +``` + +```sql +SELECT * FROM system.clusters +``` +``` ++----------------------+------------+------+ +| name | host | port | ++----------------------+------------+------+ +| dIUkzbOaqJEPudb0A7j4 | 172.17.0.6 | 9191 | +| NzfBm4KIQGEHe0sxAWa3 | 172.17.0.7 | 9191 | +| w3MuQR8aTHKHC1OLj5a6 | 172.17.0.5 | 9191 | ++----------------------+------------+------+ +``` + +### Step 4. Distributed query + +```text +EXPLAIN SELECT max(number), sum(number) FROM numbers_mt(10000000000) GROUP BY number % 3, number % 4, number % 5 LIMIT 10; ++-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| explain | ++-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Limit: 10 | +| RedistributeStage[expr: 0] | +| Projection: max(number):UInt64, sum(number):UInt64 | +| AggregatorFinal: groupBy=[[(number % 3), (number % 4), (number % 5)]], aggr=[[max(number), sum(number)]] | +| RedistributeStage[expr: sipHash(_group_by_key)] | +| AggregatorPartial: groupBy=[[(number % 3), (number % 4), (number % 5)]], aggr=[[max(number), sum(number)]] | +| Expression: (number % 3):UInt8, (number % 4):UInt8, (number % 5):UInt8, number:UInt64 (Before GroupBy) | +| ReadDataSource: scan schema: [number:UInt64], statistics: [read_rows: 10000000000, read_bytes: 80000000000, partitions_scanned: 1000001, partitions_total: 1000001], push_downs: [projections: [0]] | ++-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` + +The distributed query works, the cluster will efficiently transfer data through `flight_api_address`. + +### Step 4.1. Upload the data to the cluster +```sql +CREATE TABLE t1(i INT, j INT); +``` + +```sql +INSERT INTO t1 SELECT number, number + 300 from numbers(10000000); +``` +```sql +SELECT count(*) FROM t1; +``` +``` ++----------+ +| count() | ++----------+ +| 10000000 | ++----------+ +``` + + +## Install Databend Cluster with Helm Chart + +We support to install Databend cluster with our official helm [chart](https://github.com/datafuselabs/helm-charts) + +### Install Meta Service +Install a standalone databend meta service +Please follow the [documentation](https://github.com/datafuselabs/helm-charts/blob/main/charts/databend-meta/values.yaml) for further configuration options like high availability + + +```bash +helm repo add databend https://charts.databend.rs +helm install my-release databend/databend-meta --namespace databend --create-namespace +``` + +### Install Query Service + +The following command would regist the databend query service to the meta service with 3 nodes. + +Please follow the [documentation](https://github.com/datafuselabs/helm-charts/blob/main/charts/databend-query/values.yaml) for further configuration options like object storage secrets + +```bash +helm repo add databend https://charts.databend.rs +helm install query databend/databend-query --namespace databend --create-namespace \ + --set config.meta.address=my-release-databend-meta.databend.svc.cluster.local:9191 \ + --set replicaCount=3 +``` \ No newline at end of file