Skip to content

Commit

Permalink
Merge pull request #7234 from ZhiHanZ/add-databend-docs
Browse files Browse the repository at this point in the history
docs: add doc about how to deploy databend on kubernetes
  • Loading branch information
mergify[bot] authored Aug 22, 2022
2 parents b11999b + fc86893 commit 9a3e045
Showing 1 changed file with 160 additions and 0 deletions.
160 changes: 160 additions & 0 deletions docs/doc/10-deploy/04-deploying-databend-on-kubernetes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
---
title: Start a Query Cluster on Kubernetes
sidebar_label: K8s Cluster
description:
How to deploy a Databend query cluster on Kubernetes.
---

:::tip

Expected deployment time: ** 5 minutes ⏱ **

:::

This tutorial covers how to install and configure the Databend query cluster on kubernetes with minio storage backend.
## Before you begin

* Make sure your cluster have enough resource for installation (at least 4 CPUs, 4GB RAM, 50GB disk)
* Make sure you have a kubernetes cluster up and running. Please take a look at [k3d](https://k3d.io/v5.3.0/) or [minikube](https://minikube.sigs.k8s.io/docs/start/).
* Databend Cluster mode only works on shared storage(AWS S3 or MinIO s3-like storage).

## Deploy a sample databend cluster with minio

### Step 1 Install Minio
:::caution
This configuration is for demonstration ONLY, never use it in production, please take a look at
https://docs.min.io/docs/deploy-minio-on-kubernetes.html
for more information on production TLS and High Availability configurations.
:::

We will bootstrap a minio server on kubernetes, with the following configurations

```shell title="minio-server-config"
STORAGE_TYPE=s3
STORAGE_S3_BUCKET=sample-storage
STORAGE_S3_REGION=us-east-1
STORAGE_S3_ENDPOINT_URL=http://minio.minio.svc.cluster.local:9000
STORAGE_S3_ACCESS_KEY_ID=minio
STORAGE_S3_SECRET_ACCESS_KEY=minio123
```

The following configuration shall be applied to the target kubernetes cluster, it would create a bucket named `sample-storage` with `10Gi` storage space

```shell title="minio-server-deployment"
kubectl create namespace minio --dry-run=client -o yaml | kubectl apply -f -
kubectl apply -f https://raw.githubusercontent.com/datafuselabs/databend/main/scripts/kubernetes/minio-sample.yaml -n minio
```

### Step 2. Deploy standalone databend meta-service layer

The following configuration would configure a standalone databend meta-service on `databend-system` namespace

```shell title="databend-meta-service-deployment"
kubectl create namespace databend-system --dry-run=client -o yaml | kubectl apply -f -
kubectl apply -f https://raw.githubusercontent.com/datafuselabs/databend/main/scripts/kubernetes/meta-standalone.yaml -n databend-system
```
### Step 3. Deploy databend query cluster

The following configuration would configure a databend query cluster on `tenant1` namespace
Each pod under the deployment have `900m` vCPU with `900Mi` memory
```shell title="databend-query-service-deployment"
kubectl create namespace tenant1 --dry-run=client -o yaml | kubectl apply -f -
kubectl apply -f https://raw.githubusercontent.com/datafuselabs/databend/main/scripts/kubernetes/query-cluster.yaml -n tenant1
```

To scale up or down the query cluster, please use the following command
```shell
# scale query cluster number to 0
kubectl scale -n tenant1 deployment query --replicas=0
# scale query cluster number to 3
kubectl scale -n tenant1 deployment query --replicas=3
```

### 3.1 Check the Cluster Information
***
NOTICE: Please make sure that the localhost port 3308 is available.
***
```shell
nohup kubectl port-forward -n tenant1 svc/query-service 3308:3307 &
mysql -h127.0.0.1 -uroot -P3308
```

```sql
SELECT * FROM system.clusters
```
```
+----------------------+------------+------+
| name | host | port |
+----------------------+------------+------+
| dIUkzbOaqJEPudb0A7j4 | 172.17.0.6 | 9191 |
| NzfBm4KIQGEHe0sxAWa3 | 172.17.0.7 | 9191 |
| w3MuQR8aTHKHC1OLj5a6 | 172.17.0.5 | 9191 |
+----------------------+------------+------+
```

### Step 4. Distributed query

```text
EXPLAIN SELECT max(number), sum(number) FROM numbers_mt(10000000000) GROUP BY number % 3, number % 4, number % 5 LIMIT 10;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| explain |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Limit: 10 |
| RedistributeStage[expr: 0] |
| Projection: max(number):UInt64, sum(number):UInt64 |
| AggregatorFinal: groupBy=[[(number % 3), (number % 4), (number % 5)]], aggr=[[max(number), sum(number)]] |
| RedistributeStage[expr: sipHash(_group_by_key)] |
| AggregatorPartial: groupBy=[[(number % 3), (number % 4), (number % 5)]], aggr=[[max(number), sum(number)]] |
| Expression: (number % 3):UInt8, (number % 4):UInt8, (number % 5):UInt8, number:UInt64 (Before GroupBy) |
| ReadDataSource: scan schema: [number:UInt64], statistics: [read_rows: 10000000000, read_bytes: 80000000000, partitions_scanned: 1000001, partitions_total: 1000001], push_downs: [projections: [0]] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```

The distributed query works, the cluster will efficiently transfer data through `flight_api_address`.

### Step 4.1. Upload the data to the cluster
```sql
CREATE TABLE t1(i INT, j INT);
```

```sql
INSERT INTO t1 SELECT number, number + 300 from numbers(10000000);
```
```sql
SELECT count(*) FROM t1;
```
```
+----------+
| count() |
+----------+
| 10000000 |
+----------+
```


## Install Databend Cluster with Helm Chart

We support to install Databend cluster with our official helm [chart](https://github.com/datafuselabs/helm-charts)

### Install Meta Service
Install a standalone databend meta service
Please follow the [documentation](https://github.com/datafuselabs/helm-charts/blob/main/charts/databend-meta/values.yaml) for further configuration options like high availability


```bash
helm repo add databend https://charts.databend.rs
helm install my-release databend/databend-meta --namespace databend --create-namespace
```

### Install Query Service

The following command would regist the databend query service to the meta service with 3 nodes.

Please follow the [documentation](https://github.com/datafuselabs/helm-charts/blob/main/charts/databend-query/values.yaml) for further configuration options like object storage secrets

```bash
helm repo add databend https://charts.databend.rs
helm install query databend/databend-query --namespace databend --create-namespace \
--set config.meta.address=my-release-databend-meta.databend.svc.cluster.local:9191 \
--set replicaCount=3
```

1 comment on commit 9a3e045

@vercel
Copy link

@vercel vercel bot commented on 9a3e045 Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully deployed to the following URLs:

databend – ./

databend-databend.vercel.app
databend-git-main-databend.vercel.app
databend.vercel.app
databend.rs

Please sign in to comment.