Logstash add ElasticsearchRefs #6662

kaisecheng · 2023-04-05T16:14:48Z

This PR allows configuration of elasticsearchRefs in CRD to associate Elasticsearch clusters to Logstash. The connectivity info/ credentials are passed via environment variable to Logstash container for referencing in Logstash plugins: logstash-output-elasticsearch, logstash-input-elasticsearch and logstash-filter-elasticsearch.

Sample config

apiVersion: logstash.k8s.elastic.co/v1alpha1
kind: Logstash
metadata:
  name: logstash-sample
spec:
  elasticsearchRefs:
    - clusterName: production
      name: elasticsearch-sample
      namespace: spaceship

Environment variables taking $clusterName_ES_ as prefix are available in Logstash container

PRODUCTION_ES_HOSTS
PRODUCTION_ES_USERNAME
PRODUCTION_ES_PASSWORD
PRODUCTION_ES_SSL_CERTIFICATE_AUTHORITY # path of ca.crt of production cluster

Sample Logstash pipeline with elasticsearch output using environment variables

- pipeline.id: main
  config.string: |
    output { 
     elasticsearch {
       hosts => "${PRODUCTION_ES_HOSTS}"
       user => "${PRODUCTION_ES_USERNAME}"
       password => "${PRODUCTION_ES_PASSWORD}" 
       cacert => "${PRODUCTION_ES_SSL_CERTIFICATE_AUTHORITY}"
      } 
    }

The sample logstash yml file as located in config/samples/logstash/logstash_es.yaml will
create:

NAMESPACE  NAME                                                                   READY  REASON  AGE
default    Logstash/logstash-sample                                               -              4m41s
default    ├─Secret/logstash-sample-default-elasticsearch-sample-logstash-user    -              2m54s
default    ├─Secret/logstash-sample-logstash-es-default-elasticsearch-sample-ca   -              2m54s
default    ├─Secret/logstash-sample-ls-config                                     -              2m53s
default    ├─Service/logstash-sample-ls-api                                       -              2m53s
default    │ └─EndpointSlice/logstash-sample-ls-api-vpxv5                         -              2m53s
default    └─StatefulSet/logstash-sample-ls                                       -              2m53s
default      ├─ControllerRevision/logstash-sample-ls-c6dc8b867                    -              2m53s
default      └─Pod/logstash-sample-ls-0                                           True           2m53s

e2e test

TestLogstashEsOutput

Still todo:

add rbac

Co-authored-by: Rob Bavey rob.bavey@elastic.co

- fix e2e TestSamples - check ES status - check ES association secrets - check association volume - check association env

…h_esrefs # Conflicts: # pkg/controller/logstash/driver.go # pkg/controller/logstash/pod.go # test/e2e/test/logstash/builder.go

robbavey

This looks fantastic. I do have a couple of broad questions before I dive in fully

robbavey · 2023-04-11T19:15:28Z

pkg/controller/elasticsearch/user/roles.go

@@ -150,6 +152,22 @@ var (
 				},
 			},
 		},
+		LogstashUserRole: esclient.Role{


The indices chosen here will restrict logstash to only being able to create the "standard" indices/templates, and not custom ones created by users.

I notice that agent and enterprise search use a superuser role here, presumably to get over the same limitation. I am curious to hear from other folks here as to the best approach.

cc @pebrc @jsvd

The indices chosen here will restrict logstash to only being able to create the "standard" indices/templates, and not custom ones created by users.

That's a good point. New roles can be added using the roles section, and even if it is not ideal this can also be used to override predefined roles.

I feel like the uncommented/explained use of superuser roles by Agent and Enterprise Search is actually somewhat of a bug. Even if there is a motivation for it in order to create and ingest data into custom indices, it feels like something that should be called out in a comment/doc at the very least and probably a custom role with appropriate privileges to create arbitrary indices would have still been better than superuser.

robbavey · 2023-04-11T21:26:18Z

pkg/apis/logstash/v1alpha1/logstash_types.go

@@ -31,6 +35,10 @@ type LogstashSpec struct {
 	// +kubebuilder:validation:Optional
 	Image string `json:"image,omitempty"`

+	// ElasticsearchRef is a reference to an Elasticsearch cluster running in the same Kubernetes cluster.
+	// +kubebuilder:validation:Optional
+	ElasticsearchRefs []commonv1.ObjectSelector `json:"elasticsearchRefs,omitempty"`


Thinking forward a little to when we will have named clusters in Logstash, we talked about introducing the notion of a clusterName into the ElasticsearchRef definition along the lines of

elasticsearchRefs: - clusterName: k8sData name: local namespace: production

with the clusterName being used as the id in the cluster.yaml equivalent that we envisage for Logstash.
If we think about a cluster.yaml looking something like:

elasticsearch_clusters: k8sData: host: "https://metaldata-es-http:9200" username: “logstash_system password: REDACTED ca_file: /mnt/elastic-internal/es-certs/ca.crt

I think we could do this by creating the LogstashSpec to look like:

type LogstashSpec struct { : // Named list of Elasticsearch Clusters ElasticsearchRefs[] ElasticsearchCluster `json:"elasticsearchRefs,omitempty"` : } // Definition of type to define an Elasticsearch Cluster to be referred to in // Logstash pipeline type ElasticsearchCluster struct { commonv1.ObjectSelector `json:",omitempty,inline"` ClusterName string `json:"clusterName,omitempty"` }

We could potentially create the env variables with the clusterName? (And if we did would we need the namespace here?)

There is some prior art with the elastic agent using an outputName to distinguish between multiple elasticsearchRefs.

Alternatively, we could stick with what we have here, and write cluster.yaml as:

elasticsearch_clusters: local_production: host: "https://metaldata-es-http:9200" username: “logstash_system password: REDACTED ca_file: /mnt/elastic-internal/es-certs/ca.crt

using $namespace_$name as the id.

WDYT?

These are awesome suggestions. +1 to follow the prior art of agent to unify the experience. Also, +1 to bring cluster name in technical preview.

I think we can use outputName instead of clusterName to further share the experience. For the env variable, let's replace $namespace_$name with outputName. The name needs to be unique in elasticsearchRefs. I will add validation.

My thought on using clusterName over outputName is that we also use want to use elasticsearchRef in input and filter plugins, not just for outputs

@robbavey Revisiting the usage of clusterName. I have doubts about its value. One drawback is the stack monitoring should understand clusterName and send metrics to the corresponding cluster, which does not feasible now because the operator only considers namespace + name in the association.

From the design doc, things we want to achieve (now and future cluster.yml) with clusterName seem can be replaced by namespace + name.

elasticsearch { cluster_id => namespace_name }

elasticsearch_clusters: namespace_name: host: "https://metaldata-es-http:9200" username: “logstash_system password: REDACTED ca_file: /mnt/elastic-internal/es-certs/ca.crt namespace_name2: cloud_id: XXXXXXX api_key: yyyyyy

clusterName works like a nickname that is handy when user wants to give a second name.
I wanna keep the current env variable naming convention, namespace + name, at least in the technical preview.

I tend to agree with @barkbay that the lack of clusterName in elasticsearchRef in stack monitoring is probably ok - clusterName is added as an aid for the user to refer to an elasticsearch cluster pointed to by elasticsearchRef outside of the kubernetes CRD in pipeline configurations, without needing to deal with kubernetes semantics, such as namespaces. The ability to define/move namespaces outside of the CRD is an an interesting twist on this.

Could this be an optional feature? If users define it it will be used to name the environment variables? Also could it then just be named alias?

elasticsearchRefs: - alias: o11y-cluster name: monitor namespace: production

I think I also agree that stack monitoring is different enough to allow a special case here. Because users have to understand the relationship between the elasticsearchRef and what they define in configuration there is a motivation for such a construct. Why would one want to use an alias for stack monitoring?

In Logstash context, clusterName is a mapping of name to endpoint (host, username, password, cacert). The endpoint can be an input (source) or an output (destination). It will be referenced in pipeline.

input { kafka { } } output { elasticsearch { cluster_name => "production" } elasticsearch { cluster_name => "uat" } }

In Logstash CRD, we allow users to define the name mapping

elasticsearchRefs: - clusterName: production name: es1 - clusterName: uat name:es2 - clusterName: monitor name:es3

However, Stack Monitoring does not understand the name production, uat and monitor. If Stack Monitoring wants to insert metrics to monitor, it needs the following

monitoring: metrics: elasticsearchRefs: - name: es3

As a user, I am confused why I can't use monitor that I declared earlier. The name works in pipeline but not in Stack Monitoring.

IMO, ultimately we will need clusterName to achieve cluster.yml which contains the mapping of alias to endpoint. In technical preview, it seems not ready. I hope the example can explain the confusion.

I guess my confusion here, is what would be your expectation of using clusterName in the monitoring section?

Are you expecting to be able to use clusterName instead of name + namespace? Or as well as? Is your expectation that you could use

monitoring: metrics: elasticsearchRefs: - clusterName: monitor

or that you would expect

monitoring: metrics: elasticsearchRefs: - clusterName: monitor name:es3

or that clusterName is arbitrary without the specific cluster or clusterName setting in cluster.yml when that comes to be?

I agree that it feels a little half baked without the cluster.yml, and we only refer to clusterName in the env variable, but I do feel that it will have utility when we start allowing named clusters, and maybe introducing the notion earlier is of some value.

Thoughts @flexitrev, @jsvd, @roaksoax?

my expectation is to be able to use clusterName to output metrics to monitor

monitoring: metrics: elasticsearchRefs: - clusterName: monitor

and won't stop user to use namespace + name to output

monitoring: metrics: elasticsearchRefs: - name: es3

The declaration in spec.elasticsearchRefs will all convert to cluster.yml. Only the clusterName in cluster.yml can be referenced in pipeline.

elasticsearchRefs: - clusterName: production name: es1 - clusterName: uat name:es2 - clusterName: monitor name:es3

cluster.yml

- production: host:... username:... password:... cacert:... - uat: host:... username:... password:... cacert:... - monitor: host:... username:... password:... cacert:...

…h_esrefs # Conflicts: # pkg/controller/logstash/pod.go

kaisecheng · 2023-04-13T12:29:05Z

@barkbay I think this is ready for review

There is a discussion about bringing clusterName to elasticsearchRefs. The decision is not to add clusterName in the technical preview.

robbavey · 2023-04-14T14:42:17Z

pkg/controller/elasticsearch/user/roles.go

+				{
+					Names:      []string{"logstash-*", "ecs-logstash-*", "logs-*", "metrics-*", "synthetics-*", "traces-*"},
+					Privileges: []string{"manage", "read", "create_doc", "view_index_metadata", "create_index"},
+				},


I think we need more permissions here, in terms of both index names, and privileges.
I tried a few different simple scenarios, and there were a few issues -
Removing data_streams => false will write to logstash, and ecs_compatibility => disabled will write to ecs-logstash by default, so we will need to add those to the list of Names
Additionally, I think we need to add additional privileges - we get a

[indices:data/write/index:op_type/index] is unauthorized for user [default-logstash-sample-default-elasticsearch-logstash-user] with effective roles [eck_logstash_user_role] on indices [ecs-logstash], this action is granted by the index privileges [create,index,write,all]"

error when attempting to write to this index. I suspect we will need the write privilege here - the elasticsearch output is able to do create, index, update and delete actions
If we are going to limit the indices here, we will need to add instructions on how users can get around this

Thanks for double check it. I have updated the setting.
ecs-logstash and logstash are required by ilm_rollover_alias
The logstash doc regarding secure connection needs update

Thank you! This now works as expected.

robbavey

This looks good to me from a logstash perspective.

My only concern is around the experience when referring to an index that is not one of the "default" ones, and how we choose to document this.

robbavey · 2023-04-14T21:11:40Z

pkg/controller/logstash/env_test.go

+					AuthSecretName: "logstash-sample-default-elasticsearch-sample-logstash-user",
+					AuthSecretKey:  "default-logstash-sample-default-elasticsearch-sample-logstash-user",
+					CACertProvided: false,
+					CASecretName:   "logstash-sample-logstash-es-default-elasticsearch-sample-ca",


Would we expect this to be created if we are not using tls?

barkbay · 2023-04-17T08:01:11Z

@barkbay I think this is ready for review

Sure, I'll do a first review today.

barkbay

I left a few comments, I still have to do some tests.

pkg/apis/logstash/v1alpha1/logstash_types.go

barkbay · 2023-04-17T12:04:53Z

pkg/controller/logstash/env.go

+
+func getEsRefNamespacedName(assoc commonv1.Association) string {
+	ref := assoc.AssociationRef()
+	return fmt.Sprintf("%s_%s", ref.Namespace, ref.Name)


ref.Name may be empty if user provided connection settings through a Secret:

apiVersion: logstash.k8s.elastic.co/v1alpha1 kind: Logstash metadata: name: logstash-sample spec: count: 1 version: 8.7.0 elasticsearchRefs: - secretName: external-cloud-es-ref pipelines: {} --- apiVersion: v1 kind: Secret metadata: name: external-cloud-es-ref stringData: url: https://my-deployment.es.europe-west1.gcp.cloud.es.io username: elastic password: REDACTED

This feature, introduced in ECK 2.2.0 is not documented (yet: #6449) but can be used by a user to connect a stack component to an [external/non ECK managed] Elasticsearch cluster.

I'm not sure what is the best way to handle this case if we want to inject connection settings using env. variables.

Great that you raise the issue. We can address the external ES connection in a follow-up. I am thinking of using ref.NameOrSecretName() for env. In the future, the operator will need to write managed and unmanaged cluster info to a cluster.yml. Logstash will need to map the cluster.yml to plugin config.

In the technical preview, we aim for managed clusters.

add an issue for followup https://github.com/elastic/ingest-dev/issues/1764

Should we make it a public issue so our users can find it?

pkg/controller/logstash/env.go

pkg/apis/logstash/v1alpha1/logstash_types.go

barkbay · 2023-04-17T13:16:24Z

pkg/controller/elasticsearch/user/roles.go

@@ -150,6 +152,22 @@ var (
 				},
 			},
 		},
+		LogstashUserRole: esclient.Role{


The indices chosen here will restrict logstash to only being able to create the "standard" indices/templates, and not custom ones created by users.

That's a good point. New roles can be added using the roles section, and even if it is not ideal this can also be used to override predefined roles.

Co-authored-by: Michael Morello <michael.morello@gmail.com>

…into logstash_esrefs

…to logstash_esrefs # Conflicts: # pkg/controller/logstash/pod.go

- change env variable prefix - add validation

robbavey

This is great - really close, I have a couple of nits on clarifying the messaging on missing clusterName

pkg/apis/logstash/v1alpha1/validations.go

pkg/apis/logstash/v1alpha1/validations_test.go

Co-authored-by: Rob Bavey <rob.bavey@elastic.co>

robbavey

LGTM from the logstash side

kaisecheng · 2023-04-24T19:57:04Z

@pebrc @barkbay I have added clusterName and gotten thumbs up from Rob. This is ready to review.

robbavey · 2023-04-24T21:59:55Z

@kaisecheng One thing I noticed, that I should have noticed earlier 🤦 - we use USERNAME as the suffix in the env variable, but the plugin settings use user for the configuration settings. What do you think about having USER for the env variable?

kaisecheng · 2023-04-25T09:39:52Z

@robbavey I have changed the suffix to USER

robbavey

LGTM

robbavey · 2023-04-26T12:30:22Z

@pebrc @barkbay This is ready for final review by the ECK team - I'm happy with this from the logstash side

barkbay · 2023-04-26T13:30:41Z

👀

barkbay

LGTM

pkg/apis/logstash/v1alpha1/logstash_types.go

barkbay · 2023-04-26T13:58:36Z

pkg/apis/logstash/v1alpha1/logstash_types.go

+// ElasticsearchCluster definition of type to define an Elasticsearch Cluster to be referred to in Logstash pipeline
+type ElasticsearchCluster struct {
+	commonv1.ObjectSelector `json:",omitempty,inline"`
+	ClusterName             string `json:"clusterName,omitempty"`


Do we want to prevent user from using a same clusterName for different clusters with a new validation?

would like to improve it in future

pkg/apis/logstash/v1alpha1/logstash_types.go

pkg/apis/logstash/v1alpha1/validations.go

Co-authored-by: Michael Morello <michael.morello@gmail.com>

add multiple elasticsearch refs

fb72c22

botelastic bot added the triage label Apr 5, 2023

kaisecheng added >enhancement Enhancement of existing functionality and removed triage labels Apr 5, 2023

kaisecheng added 2 commits April 5, 2023 18:08

fix test

2386409

add validation

a61d136

kaisecheng changed the base branch from main to feature/logstash April 5, 2023 19:54

kaisecheng added 6 commits April 6, 2023 19:31

add unit tests and update e2e

a760d82

- fix e2e TestSamples - check ES status - check ES association secrets - check association volume - check association env

Merge remote-tracking branch 'upstream/feature/logstash' into logstas…

113518d

…h_esrefs # Conflicts: # pkg/controller/logstash/driver.go # pkg/controller/logstash/pod.go # test/e2e/test/logstash/builder.go

merge pipelines

a538805

add env test

06892f0

add es-output e2e

68ac14e

add logstash user to association

7380323

kaisecheng marked this pull request as ready for review April 11, 2023 13:27

kaisecheng requested a review from robbavey April 11, 2023 13:28

robbavey reviewed Apr 11, 2023

View reviewed changes

kaisecheng added 5 commits April 12, 2023 17:54

Merge remote-tracking branch 'upstream/feature/logstash' into logstas…

d1b185d

…h_esrefs # Conflicts: # pkg/controller/logstash/pod.go

merge test

3b9608b

fix generate

ed24d91

fix roles test

de74b4f

clean up

e727f71

kaisecheng requested a review from barkbay April 13, 2023 12:18

robbavey requested changes Apr 14, 2023

View reviewed changes

fix index role

6eec1c9

kaisecheng requested a review from robbavey April 14, 2023 16:48

robbavey reviewed Apr 14, 2023

View reviewed changes

barkbay reviewed Apr 17, 2023

View reviewed changes

Update pkg/apis/logstash/v1alpha1/logstash_types.go

7303cd6

Co-authored-by: Michael Morello <michael.morello@gmail.com>

kaisecheng and others added 6 commits April 21, 2023 15:15

import format

cede486

Update pkg/controller/logstash/pod.go

65a57a6

Co-authored-by: Michael Morello <michael.morello@gmail.com>

Merge branch 'logstash_esrefs' of github.com:kaisecheng/cloud-on-k8s …

0724deb

…into logstash_esrefs

Merge branch 'feature/logstash' of github.com:elastic/cloud-on-k8s in…

af3b4d0

…to logstash_esrefs # Conflicts: # pkg/controller/logstash/pod.go

fix test

3d91ff3

- add clusterName

899b095

- change env variable prefix - add validation

kaisecheng requested a review from robbavey April 22, 2023 01:15

robbavey requested changes Apr 24, 2023

View reviewed changes

kaisecheng and others added 4 commits April 24, 2023 18:55

Update pkg/apis/logstash/v1alpha1/validations.go

258b690

Co-authored-by: Rob Bavey <rob.bavey@elastic.co>

Update pkg/apis/logstash/v1alpha1/validations_test.go

90f893b

Co-authored-by: Rob Bavey <rob.bavey@elastic.co>

Update pkg/apis/logstash/v1alpha1/validations_test.go

51a953a

Co-authored-by: Rob Bavey <rob.bavey@elastic.co>

Update pkg/apis/logstash/v1alpha1/validations_test.go

3304bd7

Co-authored-by: Rob Bavey <rob.bavey@elastic.co>

robbavey approved these changes Apr 24, 2023

View reviewed changes

kaisecheng requested review from pebrc, robbavey and barkbay April 24, 2023 19:54

change env var from USERNAME to USER

94c07c1

robbavey approved these changes Apr 25, 2023

View reviewed changes

barkbay approved these changes Apr 26, 2023

View reviewed changes

kaisecheng and others added 4 commits April 26, 2023 15:59

Update pkg/apis/logstash/v1alpha1/logstash_types.go

3c4c5f9

Co-authored-by: Michael Morello <michael.morello@gmail.com>

Update pkg/apis/logstash/v1alpha1/logstash_types.go

ae0df6e

Co-authored-by: Michael Morello <michael.morello@gmail.com>

Update pkg/apis/logstash/v1alpha1/validations.go

36b1182

Co-authored-by: Michael Morello <michael.morello@gmail.com>

fix generate

eda38a5

kaisecheng merged commit 69b5ef9 into elastic:feature/logstash Apr 26, 2023

thbkrkr added the :logstash label Jun 23, 2023

Logstash add ElasticsearchRefs #6662

Logstash add ElasticsearchRefs #6662

Conversation

kaisecheng commented Apr 5, 2023 • edited Loading

robbavey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pebrc Apr 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pebrc Apr 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaisecheng Apr 20, 2023 • edited Loading

Choose a reason for hiding this comment

kaisecheng commented Apr 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robbavey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barkbay commented Apr 17, 2023

barkbay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robbavey left a comment

Choose a reason for hiding this comment

robbavey left a comment

Choose a reason for hiding this comment

kaisecheng commented Apr 24, 2023

robbavey commented Apr 24, 2023

kaisecheng commented Apr 25, 2023

robbavey left a comment

Choose a reason for hiding this comment

robbavey commented Apr 26, 2023

barkbay commented Apr 26, 2023

barkbay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaisecheng commented Apr 5, 2023 •

edited

Loading

pebrc Apr 20, 2023 •

edited

Loading

pebrc Apr 20, 2023 •

edited

Loading

kaisecheng Apr 20, 2023 •

edited

Loading