diff --git a/Makefile b/Makefile
index ce06ffb1..56290d21 100644
--- a/Makefile
+++ b/Makefile
@@ -6,7 +6,7 @@ IMG ?= "datainfrahq/druid-operator"
# Local Image URL to be pushed to kind registery
IMG_KIND ?= "localhost:5001/druid-operator"
# NAMESPACE for druid operator e2e
-NAMESPACE_DRUID_OPERATOR ?= "druid-operator"
+NAMESPACE_DRUID_OPERATOR ?= "druid-operator-system"
# NAMESPACE for zk operator e2e
NAMESPACE_ZK_OPERATOR ?= "zk-operator"
# NAMESPACE for zk operator e2e
diff --git a/README.md b/README.md
index e7d34d73..d1b35d2f 100644
--- a/README.md
+++ b/README.md
@@ -6,17 +6,22 @@
Kubernetes Operator For Apache Druid
-**This is the official [druid-operator](https://github.com/druid-io/druid-operator) project, now maintained by [Maintainers.md](./MAINTAINERS.md).
+**This is the official [druid-operator](https://github.com/druid-io/druid-operator) project, now maintained by [Maintainers.md](./MAINTAINERS.md).
[druid-operator](https://github.com/druid-io/druid-operator) is depreacted. Ref to [issue](https://github.com/druid-io/druid-operator/issues/329) and [PR](https://github.com/druid-io/druid-operator/pull/336). Feel free to open issues and PRs! Collaborators are welcome !**
![Build Status](https://github.com/datainfrahq/druid-operator/actions/workflows/docker-image.yml/badge.svg) ![Docker pull](https://img.shields.io/docker/pulls/datainfrahq/druid-operator.svg) [![Latest Version](https://img.shields.io/github/tag/datainfrahq/druid-operator)](https://github.com/datainfrahq/druid-operator/releases) [![Slack](https://img.shields.io/badge/slack-brightgreen.svg?logo=slack&label=Community&style=flat&color=%2373DC8C&)](https://kubernetes.slack.com/archives/C04F4M6HT2L)
+
-
-
- Druid Operator provisions and manages [Apache Druid](https://druid.apache.org/) cluster on kubernetes. Druid Operator is designed to provision and manage [Apache Druid](https://druid.apache.org/) in distributed mode only. It is built using the [kubebuilder](https://github.com/kubernetes-sigs/kubebuilder). Language used is GoLang. Druid Operator is available on [operatorhub.io](https://operatorhub.io/operator/druid-operator) Refer to [Documentation](./docs/README.md) for getting started. Join Kubernetes slack and join [druid-operator](https://kubernetes.slack.com/archives/C04F4M6HT2L)
+Druid Operator provisions and manages [Apache Druid](https://druid.apache.org/) cluster on kubernetes.
+Druid Operator is designed to provision and manage [Apache Druid](https://druid.apache.org/) in distributed mode only.
+It is built in Golang using [kubebuilder](https://github.com/kubernetes-sigs/kubebuilder).
+Druid Operator is available on [operatorhub.io](https://operatorhub.io/operator/druid-operator)
+Refer to [Documentation](./docs/README.md) for getting started.
+
+Feel free to join Kubernetes slack and join [druid-operator](https://kubernetes.slack.com/archives/C04F4M6HT2L)
### Talks and Blogs on Druid Operator
@@ -37,14 +42,11 @@
### Notifications
-- The project moved to Kubebuilder v3 which requires a [manual change](docs/kubebuilder_v3_migration.md) in the operator.
-- Users may experience HPA issues with druid-operator with release 0.0.5, as described in the [issue](https://github.com/druid-io/druid-operator/issues/160).
-- The latest release 0.0.6 has fixes for the above issue.
-- The operator has moved from HPA apiVersion autoscaling/v2beta1 to autoscaling/v2 API users will need to update there HPA Specs according v2beta2 api in order to work with the latest druid-operator release.
-- Users may experience pvc deletion [issue](https://github.com/druid-io/druid-operator/issues/186) in release 0.0.6, this issue has been fixed in patch release 0.0.6.1.
+- The project moved to Kubebuilder v3 which requires a [manual change](docs/kubebuilder_v3_migration.md) in the operator.
+- Users are encourage to use operator version 0.0.9+.
+- The operator has moved from HPA apiVersion autoscaling/v2beta1 to autoscaling/v2 API users will need to update there HPA Specs according v2 api in order to work with the latest druid-operator release.
- druid-operator has moved Ingress apiVersion networking/v1beta1 to networking/v1. Users will need to update there Ingress Spec in the druid CR according networking/v1 syntax. In case users are using schema validated CRD, the CRD will also be needed to be updated.
-- druid-operator has moved PodDisruptionBudget apiVersion policy/v1beta1 to policy/v1. Users will need to update there Kubernetes versions to 1.21+ to use druid-operator tag 0.0.9+.
-- The latest release for druid-operator is v1.0.0, this release is compatible with k8s version 1.25. HPA API is kept to version v2beta2.
+- The v1.0.0 release for druid-operator is compatible with k8s version 1.25. HPA API is kept to version v2beta2.
### Kubernetes version compatibility
@@ -57,7 +59,7 @@
### Contributors
-
+
### Note
ApacheĀ®, [Apache Druid, DruidĀ®](https://druid.apache.org/) are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. This project, druid-operator, is not an Apache Software Foundation project.
diff --git a/chart/values.yaml b/chart/values.yaml
index 5a32a04d..143505df 100644
--- a/chart/values.yaml
+++ b/chart/values.yaml
@@ -4,8 +4,8 @@
env:
DENY_LIST: "default,kube-system" # Comma-separated list of namespaces to ignore
- RECONCILE_WAIT: "10s" # Reconciliation delay
- WATCH_NAMESPACE: "" # Namespace to watch or empty string to watch all namespaces, To watch multiple namespaces add , into string. Ex: WATCH_NAMESPACE: "ns1,ns2,ns3"
+ RECONCILE_WAIT: "10s" # Reconciliation delay
+ WATCH_NAMESPACE: "" # Namespace to watch or empty string to watch all namespaces, To watch multiple namespaces add , into string. Ex: WATCH_NAMESPACE: "ns1,ns2,ns3"
#MAX_CONCURRENT_RECONCILES:: "" # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run.
replicaCount: 1
@@ -46,6 +46,9 @@ podAnnotations: {}
podSecurityContext:
runAsNonRoot: true
+ fsGroup: 65532
+ runAsUser: 65532
+ runAsGroup: 65532
securityContext:
allowPrivilegeEscalation: false
diff --git a/docs/dev_doc.md b/docs/dev_doc.md
index 9076daed..f5c0e5e0 100644
--- a/docs/dev_doc.md
+++ b/docs/dev_doc.md
@@ -1,7 +1,7 @@
## Dev Dependencies
-- Golang 1.19+
-- Kubebuilder 2.3.1+
+- Golang 1.20+
+- Kubebuilder v3
## Running Operator Locally
diff --git a/docs/druid_cr.md b/docs/druid_cr.md
index 91014549..176e81a1 100644
--- a/docs/druid_cr.md
+++ b/docs/druid_cr.md
@@ -5,7 +5,6 @@
- For full details on spec refer to ```pkg/apis/druid/v1alpha1/druid_types.go```
- The operator supports both deployments and statefulsets for druid Nodes. ```kind``` can be specified in the druid NodeSpec's to ```Deployment``` / ```StatefulSet```.
- ```NOTE: The default behavior shall provision all the nodes as statefulsets.```
-
- The following are cluster scoped and common to all the druid nodes.
```yaml
@@ -46,13 +45,13 @@ spec:
common.runtime.properties: |
```
- - The following are specific to a node.
+- The following are specific to a node.
```yaml
nodes:
# String value, can be anything to define a node name.
brokers:
- # nodeType can be broker,historical, middleManager, indexer, router, coordinator and overlord.
+ # nodeType can be broker, historical, middleManager, indexer, router, coordinator and overlord.
# Required Key
nodeType: "broker"
# Optionally specify for broker nodes
@@ -67,4 +66,5 @@ spec:
# Runtime Properties for the node
# Required Key
runtime.properties: |
+ ...
```
diff --git a/docs/features.md b/docs/features.md
index 68cd41c9..3645b94d 100644
--- a/docs/features.md
+++ b/docs/features.md
@@ -14,58 +14,72 @@
## Deny List in Operator
+
- There may be use cases where we want the operator to watch all namespaces but restrict few namespaces, due to security, testing flexibility etc reasons.
- The druid operator supports such cases. In ```deploy/operator.yaml```, user can enable ```DENY_LIST``` env and pass the namespaces to be excluded.
- Each namespace to be seperated using a comma.
## Reconcile Time in Operator
+
- As per operator pattern, the druid operator reconciles every 10s ( default reconcile time ) to make sure the desired state ( druid CR ) in sync with current state.
- In case user wants to adjust the reconcile time, it can be adjusted by adding an ENV variable in ```deploy/operator.yaml```, user can enable ```RECONCILE_WAIT``` env and pass in the value suffixed with ```s``` string ( example: 30s). The default time is 10s.
## Finalizer in Druid CR
+
- Druid Operator supports provisioning of sts as well as deployments. When sts is created a pvc is created along. When druid CR is deleted the sts controller does not delete pvc's associated with sts.
-- In case user does care about pvc data and wishes to reclaim it, user can enable ```DisablePVCDeletionFinalizer: true``` in druid CR.
+- In case user does care about pvc data and wishes to reclaim it, user can enable ```DisablePVCDeletionFinalizer: true``` in druid CR.
- Default behavior shall trigger finalizers and pre-delete hooks that shall be executed which shall first clean up sts and then pvc referenced by sts.
- Default behavior is set to true ie after deletion of CR, any pvc's provisioned by sts shall be deleted.
## Deletetion of Orphan PVC's
-- Assume ingestion is kicked off on druid, the sts MiddleManagers nodes are scaled to a certain number of replicas, and when the ingestion is completed. The middlemanagers are scaled down to avoid costs etc.
+
+- Assume ingestion is kicked off on druid, the sts MiddleManagers nodes are scaled to a certain number of replicas, and when the ingestion is completed. The middlemanagers are scaled down to avoid costs etc.
- Sts on scale down, just terminates the pods it owns not the PVC. PVC are left orpahned and are of little or no use.
-- In such cases druid-operator supports deletion of pvc orphaned by the sts.
+- In such cases druid-operator supports deletion of pvc orphaned by the sts.
- To enable this feature users need to add a flag in the druid cluster spec ```deleteOrphanPvc: true```.
## Rolling Deploy
+
- Operator supports ```rollingDeploy```, in case specified to ```true``` at the clusterSpec, the operator does incremental updates in the order as mentioned [here](http://druid.io/docs/latest/operations/rolling-updates.html)
- In rollingDeploy each node is update one by one, and incase any of the node goes in pending/crashing state during update the operator halts the update and does not update the other nodes. This requires manual intervation.
-- Default updates and cluster creation is in parallel.
+- Default updates and cluster creation is in parallel.
- Regardless of rolling deploy enabled, cluster creation always happens in parallel.
## Force Delete of Sts Pods
+
- During upgrade if sts is set to ordered ready, the sts controller will not recover from crashloopback state. The issues is referenced [here](https://github.com/kubernetes/kubernetes/issues/67250), and here's a reference [doc](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#forced-rollback)
-- How operator solves this is using the ```forceDeleteStsPodOnError``` key, the operator will delete the sts pod if its in crashloopback state. Example Scenario: During upgrade, user rolls out a faulty configuration causing the historical pod going in crashing state, user rolls out a valid configuration, the new configuration will not be applied unless user manual delete pods, so solve this scenario operator shall delete the pod automatically without user intervention.
+- How operator solves this is using the ```forceDeleteStsPodOnError``` key, the operator will delete the sts pod if its in crashloopback state. Example Scenario: During upgrade, user rolls out a faulty configuration causing the historical pod going in crashing state, user rolls out a valid configuration, the new configuration will not be applied unless user manual delete pods, so solve this scenario operator shall delete the pod automatically without user intervention.
- ```NOTE: User must be aware of this feature, there might be cases where crashloopback might be caused due probe failure, fault image etc, the operator shall keep on deleting on each re-concile loop. Default Behavior is True ```
## Scaling of Druid Nodes
+
- Operator supports ```HPA autosaling/v2``` Spec in the nodeSpec for druid nodes. In case HPA deployed, HPA controller maintains the replica count/state for the particular statefulset referenced. Refer to ```examples.md``` for HPA configuration.
- ```NOTE: Prefered to scale only brokers using HPA.```
- In order to scale MM with HPA, its recommended not to use HPA. Refer to these discussions which have adderessed the issues in details.
-1. https://github.com/apache/druid/issues/8801#issuecomment-664020630
-2. https://github.com/apache/druid/issues/8801#issuecomment-664648399
+
+1.
+2.
## Volume Expansion of Druid Nodes Running As StatefulSets
+
```NOTE: This feature has been tested only on cloud environments and storage classes which have supported volume expansion. This feature uses cascade=orphan strategy to make sure only Stateful is deleted and recreated and pods are not deleted.```
+
- Druid Nodes specifically historicals run as statefulsets. Each statefulset replica has a pvc attached.
- NodeSpec in druid CR has key ```volumeClaimTemplates``` where users can define the pvc's storage class as well as size.
- In case a user wants to increase size in the node, the statefulsets cannot be directly updated.
- Druid Operator behind the scenes performs seamless update of the statefulset, plus patch the pvc's with desired size defined in the druid CR.
-- Druid operator shall perform a cascade deletion of the sts, and shall patch the pvc. Cascade deletion has no affect to the pods running, queries are served and no downtime is experienced.
+- Druid operator shall perform a cascade deletion of the sts, and shall patch the pvc. Cascade deletion has no affect to the pods running, queries are served and no downtime is experienced.
- While enabling this feature, druid operator will check if volume expansion is supported in the storage class mentioned in the druid CR, only then will it perform expansion.
-- Shrinkage of pvc's isnt supported, desiredSize cannot be less than currentSize as well as counts.
+- Shrinkage of pvc's isnt supported, **desiredSize cannot be less than currentSize as well as counts**.
- To enable this feature ```scalePvcSts``` needs to be enabled to ```true```.
- By default, this feature is disabled.
## Add Additional Containers in Druid Nodes
+
- The Druid operator supports additional containers to run along with the druid services. This helps support co-located, co-managed helper processes for the primary druid application
+- This can be used for init containers or sidecars or proxies etc.
+- To enable this features users just need to add a new container to the container list.
+- This is scoped at cluster scope only, which means that additional container will be common to all the nodes.
- This can be used for init containers or sidecars or proxies etc.
- To enable this features users just need to add a new container to the container list
- This is scoped at cluster scope only, which means that additional container will be common to all the nodes
@@ -181,4 +195,4 @@ All the probes definitions are documented bellow:
timeoutSeconds: 10
```
-
\ No newline at end of file
+
diff --git a/docs/getting_started.md b/docs/getting_started.md
index a6d9bfc3..337b033a 100644
--- a/docs/getting_started.md
+++ b/docs/getting_started.md
@@ -1,21 +1,27 @@
## Install the operator
```bash
+# This will deploy kind to test the stack locally
+make kind
# This will deploy the operator into the druid-operator-system namespace
make deploy
-# Check the deployed druid-operator
+# Check the deployed druid-operator-system
kubectl describe deployment -n druid-operator-system druid-operator-controller-manager
```
Operator can be deployed with namespaced scope or clutser scope. By default, the operator is namespaced scope.
For the operator to be cluster scope, do the following changes:
+
- Edit the `config/default/manager_config_patch.yaml` so the `patchesStrategicMerge:` will look like this:
+
```yaml
patchesStrategicMerge:
- manager_auth_proxy_patch.yaml
- manager_config_patch.yaml
```
+
- Edit the `config/default/manager_config_patch.yaml` to look like this:
+
```yaml
apiVersion: apps/v1
kind: Deployment
@@ -33,28 +39,33 @@ spec:
```
## Install the operator using Helm chart
+
- Install cluster scope operator into the `druid-operator-system` namespace:
+
```bash
# Install Druid operator using Helm
-helm -n druid-operator-system install --create-namespace cluster-druid-operator ./chart
+helm -n druid-operator-system upgrade -i --create-namespace cluster-druid-operator ./chart
# ... or generate manifest.yaml to install using other means:
helm -n druid-operator-system template --create-namespace cluster-druid-operator ./chart > manifest.yaml
```
- Install namespaced operator into the `druid-operator-system` namespace:
+
```bash
# Install Druid operator using Helm
-helm -n druid-operator-system install --create-namespace --set env.WATCH_NAMESPACE="mynamespace" namespaced-druid-operator ./chart
+kubectl create ns mynamespace
+helm -n druid-operator-system upgrade -i --create-namespace --set env.WATCH_NAMESPACE="mynamespace" namespaced-druid-operator ./chart
# you can use myvalues.yaml instead of --set
-helm -n druid-operator-system install --create-namespace -f myvalues.yaml namespaced-druid-operator ./chart
+helm -n druid-operator-system upgrade -i --create-namespace -f myvalues.yaml namespaced-druid-operator ./chart
# ... or generate manifest.yaml to install using other means:
helm -n druid-operator-system template --set env.WATCH_NAMESPACE="" namespaced-druid-operator ./chart --create-namespace > manifest.yaml
```
- Update settings, upgrade or rollback:
+
```bash
# To upgrade chart or apply changes in myvalues.yaml
helm -n druid-operator-system upgrade -f myvalues.yaml namespaced-druid-operator ./chart
@@ -64,6 +75,7 @@ helm -n druid-operator-system rollback cluster-druid-operator
```
- Uninstall operator
+
```bash
# To avoid destroying existing clusters, helm will not uninstall its CRD. For
# complete cleanup annotation needs to be removed first:
@@ -89,8 +101,6 @@ Note that above tiny-cluster only works on a single node kubernetes cluster(e.g.
## Debugging Problems
- - For kubernetes version 1.11 make sure to disable ```type: object``` in the CRD root spec.
-
```bash
# get druid-operator pod name
kubectl get po | grep druid-operator