Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for volumeClaimTemplates for Logstash controller #6884

Merged
merged 20 commits into from
Jul 3, 2023

Conversation

robbavey
Copy link
Member

@robbavey robbavey commented Jun 6, 2023

This commit adds support for persistent volumes to the Logstash operator:

Syntax for volumeClaimTemplate
 volumeClaimTemplates:
    - metadata:
        name: logstash-data # Do not change this name unless you set up a volume mount for the data path.
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 2Gi
    - metadata:
        name: pq
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi

By default, a logstash-data persistentVolume is created, that maps to /usr/share/logstash/data to use for logstash persistent storage, typically used for storage from plugins. The logstash-data volume claim is, by default, a small (1Gi) volume, using the standard StorageClass of the K8s provider, but can be overridden by adding a spec.volumeClaimTemplate section with a name of logstash-data.

Separate storage (eg for Logstash configurations using PQ's and/or DLQ's) can be added by including an additional spec.volumeClaimTemplate along with a corresponding spec.podTemplate.spec.containers.volumeMount for each requested volume:

Sample Logstash config with separate volume for persistent queue
apiVersion: logstash.k8s.elastic.co/v1alpha1
kind: Logstash
metadata:
  name: logstash
spec:
  count: 1
  version: 8.7.0
  podTemplate:
    spec:
      containers:
      - name: logstash
        volumeMounts:
          - mountPath: /usr/share/logstash/pq
            name: pq
            readOnly: false
  volumeClaimTemplates:
    - metadata:
        name: pq
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
  config: 
    log.level: info
    queue.type: persisted
    path.queue: /usr/share/logstash/pq

This commit also performs a mild refactor, moving volume into the volume package that this commit adds.

Note that as it currently stands, this PR makes a change to the statefulSet, which will require any existing logstash configurations created in the technical preview.

This PR also does not support resizing of volumes by updating the storage value in spec.volumeClaimTemplates, even with storage classes that support automatic volume expansion, and requires the user to do this manually by deleting existing statefulsets with cascading deletes turned off to avoid pod deletion.

Documentation will be added as a follow-on PR, or additional commit to this PR

@botelastic botelastic bot added the triage label Jun 6, 2023
@thbkrkr thbkrkr added >enhancement Enhancement of existing functionality v2.9.0 labels Jun 6, 2023
@botelastic botelastic bot removed the triage label Jun 6, 2023
@kaisecheng kaisecheng self-requested a review June 7, 2023 10:56
Copy link
Contributor

@kaisecheng kaisecheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am testing the PVC. When I delete Logstash kubectl delete logstash logstash-sample, the PVCs do not get deleted, instead, show the status Bound

$ kubectl get pvc
NAME                                         STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
elasticsearch-data-monitoring-es-default-0   Terminating   pvc-b584f2d8-f506-435e-aa9b-9382366ffadc   1Gi        RWO            standard       12m
logstash-data-logstash-sample-ls-0           Bound         pvc-b1ee74e6-b9a8-4a13-8fb2-bce88b3b1484   1Gi        RWO            standard       3m7s

PVC of Elasticsearch is gone. I think Logstash needs to clean up as well.

I tried to update volumeClaimTemplates to 2Gi in logstash.yml. The PVC doesn't get updated and there is no error but stays in 1Gi. I need to delete PVC manually and re-apply logstash.yml to have 2Gi. Is this expected? Edit: this is the same as the description.

The rest of the suggestions are about naming.

pkg/controller/logstash/volume/volumes.go Outdated Show resolved Hide resolved
pkg/controller/logstash/volume/volumes.go Outdated Show resolved Hide resolved
pkg/controller/logstash/volume/volumes.go Outdated Show resolved Hide resolved
pkg/controller/logstash/volume/volumes.go Outdated Show resolved Hide resolved
config/samples/logstash/logstash.yaml Outdated Show resolved Hide resolved
pkg/controller/logstash/volume/defaults.go Show resolved Hide resolved
@thbkrkr
Copy link
Contributor

thbkrkr commented Jun 7, 2023

PVC of Elasticsearch is gone. I think Logstash needs to clean up as well.

@kaisecheng @robbavey

It may likely make sense to implement the same behaviour than for ES where user can control if the volumes need to be deleted or preserved on Logstash deletion?

Check elastic.co/guide/cloud-on-k8s/k8s-volume-claim-templates.html, and:

func reconcilePVCOwnerRefs(ctx context.Context, c k8s.Client, es esv1.Elasticsearch) error {

@robbavey
Copy link
Member Author

robbavey commented Jun 7, 2023

Our use case is slightly different from Elasticsearch, I think - deleting persistent volumes has a more profound effect if PQ/DLQ is used, as we do not replicate data, and deleting volume(s) may delete in-flight data if the queues were not drained.
But yes, I think adding something along the same lines as Elasticsearch to give users control of their deletion policy makes sense - I'll take a look at what that entails here

The use of PQ/DLQ with clusters that automatically scale down has some potentially interesting side effects, that we need to think about how we go about, and how we document their use. We need to avoid scaling down where logstash pods that are still draining data from their PQs/DLQs - and avoid deleting volumes that still have PQ (or DLQ) data. One thing we may consider is suggesting (or automatically setting) queue.drain: true and setting TerminationGracePeriodSeconds artificially high to avoid logstash nodes getting killed before completion.

I know we'll need to create some good documentation on how to use volumes with Logstash depending on use case along the lines of the docs for storage recommendations that the Elasticsearch operator has

kaisecheng

This comment was marked as resolved.

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. One thing I believe we need to do here is to set a securityContext with fsGroup: 1000 to make the data volume writeable for Logstash otherwise it won't be able to start.

Volume expansion maybe similar to what we do for Elasticsearch would indeed be a nice addition to this, especially given that there is no work around other than deleting the StatefulSet. In ES users can just create a new node set with more storage if needed or if their storage class does not support extension.

pkg/controller/logstash/volume/volumes.go Outdated Show resolved Hide resolved
pkg/controller/logstash/initcontainer.go Outdated Show resolved Hide resolved
pkg/controller/logstash/config.go Outdated Show resolved Hide resolved
config/samples/logstash/logstash.yaml Outdated Show resolved Hide resolved
@thbkrkr thbkrkr self-assigned this Jun 27, 2023
@thbkrkr
Copy link
Contributor

thbkrkr commented Jun 29, 2023

I have applied your feedbacks.

  • d3697e4 apply renaming review feedbacks
  • 4f70c97 set PodSecurityContext with fsGroup: 1000
  • 77add7a validate no PVCs change on update
  • 3e02922, 8a7f082 add a sample with logstash-data and pq PVCs
  • + 8965ed8 refactoring BuildVolumes to define all volumes and volumeMounts in one place

It is ready for another round of reviews.

Manifests to test docker.elastic.co/eck-ci/eck-operator-pr:6884-8a7f0829: crds.yaml, operator.yaml, ls.yaml.

Testing the validation error updating a PV storage size:
> k apply -f ls.yml

elasticsearch.elasticsearch.k8s.elastic.co/z unchanged
Error from server (Forbidden): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{"apiVersion":"logstash.k8s.elastic.co/v1alpha1","kind":"Logstash","metadata":{"annotations":{},"name":"z","namespace":"default"},"spec":{"config":{"api.http.host":"0.0.0.0","log.level":"info","queue.type":"memory"},"count":1,"elasticsearchRefs":[{"clusterName":"z","name":"z"}],"pipelines":[{"config.string":"input { exec { command =\u003e 'uptime' interval =\u003e 10 } }\noutput {\n elasticsearch {\n hosts =\u003e [ \"${Z_ES_HOSTS}\" ]\n ssl =\u003e true\n cacert =\u003e \"${Z_ES_SSL_CERTIFICATE_AUTHORITY}\"\n user =\u003e \"${Z_ES_USER}\"\n password =\u003e \"${Z_ES_PASSWORD}\"\n }\n}\n","pipeline.id":"main"}],"version":"8.6.0","volumeClaimTemplates":[{"metadata":{"name":"logstash-data"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"5Gi"}}}}]}}\n"}},"spec":{"volumeClaimTemplates":[{"metadata":{"name":"logstash-data"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"5Gi"}}}}]}}
to:
Resource: "logstash.k8s.elastic.co/v1alpha1, Resource=logstashes", GroupVersionKind: "logstash.k8s.elastic.co/v1alpha1, Kind=Logstash"
Name: "z", Namespace: "default"
for: "../lsm.yml": error when patching "../lsm.yml": admission webhook "elastic-logstash-validation-v1alpha1.k8s.elastic.co" denied the request: Logstash.logstash.k8s.elastic.co "z" is invalid: spec.volumeClaimTemplates: Invalid value: []v1.PersistentVolumeClaim{v1.PersistentVolumeClaim{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"logstash-data", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PersistentVolumeClaimSpec{AccessModes:[]v1.PersistentVolumeAccessMode{"ReadWriteOnce"}, Selector:(*v1.LabelSelector)(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList{"storage":resource.Quantity{i:resource.int64Amount{value:5368709120, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"5Gi", Format:"BinarySI"}}, Claims:[]v1.ResourceClaim(nil)}, VolumeName:"", StorageClassName:(*string)(nil), VolumeMode:(*v1.PersistentVolumeMode)(nil), DataSource:(*v1.TypedLocalObjectReference)(nil), DataSourceRef:(*v1.TypedObjectReference)(nil)}, Status:v1.PersistentVolumeClaimStatus{Phase:"", AccessModes:[]v1.PersistentVolumeAccessMode(nil), Capacity:v1.ResourceList(nil), Conditions:[]v1.PersistentVolumeClaimCondition(nil), AllocatedResources:v1.ResourceList(nil), ResizeStatus:(*v1.PersistentVolumeClaimResizeStatus)(nil)}}}: Volume claim templates cannot be modified

A flaw of being conservative with the PVs is that if you create a Logstash, delete it and then recreate it with a PVC storage size different, the PVC/PV are not recreated.

@thbkrkr
Copy link
Contributor

thbkrkr commented Jun 29, 2023

buildkite test this -f p=gke,E2E_TAGS=logstash -m s=8.8.0,s=8.9.0-SNAPSHOT 🟢

# 8.9.0-SHAPSHOT
TestLogstashEsOutput 3m38.98s
TestLogstashPipelineReload 1m9.71s
TestLogstashServerVersionUpgradeToLatest8x 2m54.85s
TestLogstashStackMonitoring 1m29.57s
TestLogstashWithCustomService 1m0.43s
TestLogstashWithCustomServiceAndAmendedApi 1m3.49s
TestLogstashWithReworkedApiService 1m0.39s
TestMultipleLogstashes 1m9.41s
TestPipelineConfigLogstash 54.42s
TestPipelineConfigRefLogstash 1m24.47s
TestSingleLogstash 1m3.39s

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thbkrkr thbkrkr enabled auto-merge (squash) July 3, 2023 13:49
@thbkrkr thbkrkr merged commit d6610c8 into elastic:main Jul 3, 2023
@pebrc pebrc changed the title Add Support for volumeClaimTemplates for Logstash operator Add Support for volumeClaimTemplates for Logstash controller Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality :logstash v2.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants