-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logstash add ElasticsearchRefs #6662
Logstash add ElasticsearchRefs #6662
Conversation
- fix e2e TestSamples - check ES status - check ES association secrets - check association volume - check association env
…h_esrefs # Conflicts: # pkg/controller/logstash/driver.go # pkg/controller/logstash/pod.go # test/e2e/test/logstash/builder.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic. I do have a couple of broad questions before I dive in fully
@@ -150,6 +152,22 @@ var ( | |||
}, | |||
}, | |||
}, | |||
LogstashUserRole: esclient.Role{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indices chosen here will restrict logstash to only being able to create the "standard" indices/templates, and not custom ones created by users.
I notice that agent
and enterprise search
use a superuser
role here, presumably to get over the same limitation. I am curious to hear from other folks here as to the best approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indices chosen here will restrict logstash to only being able to create the "standard" indices/templates, and not custom ones created by users.
That's a good point. New roles can be added using the roles
section, and even if it is not ideal this can also be used to override predefined roles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the uncommented/explained use of superuser
roles by Agent and Enterprise Search is actually somewhat of a bug. Even if there is a motivation for it in order to create and ingest data into custom indices, it feels like something that should be called out in a comment/doc at the very least and probably a custom role with appropriate privileges to create arbitrary indices would have still been better than superuser
.
@@ -31,6 +35,10 @@ type LogstashSpec struct { | |||
// +kubebuilder:validation:Optional | |||
Image string `json:"image,omitempty"` | |||
|
|||
// ElasticsearchRef is a reference to an Elasticsearch cluster running in the same Kubernetes cluster. | |||
// +kubebuilder:validation:Optional | |||
ElasticsearchRefs []commonv1.ObjectSelector `json:"elasticsearchRefs,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking forward a little to when we will have named clusters in Logstash, we talked about introducing the notion of a clusterName
into the ElasticsearchRef definition along the lines of
elasticsearchRefs:
- clusterName: k8sData
name: local
namespace: production
with the clusterName
being used as the id in the cluster.yaml
equivalent that we envisage for Logstash.
If we think about a cluster.yaml
looking something like:
elasticsearch_clusters:
k8sData:
host: "https://metaldata-es-http:9200"
username: “logstash_system
password: REDACTED
ca_file: /mnt/elastic-internal/es-certs/ca.crt
I think we could do this by creating the LogstashSpec
to look like:
type LogstashSpec struct {
:
// Named list of Elasticsearch Clusters
ElasticsearchRefs[] ElasticsearchCluster `json:"elasticsearchRefs,omitempty"`
:
}
// Definition of type to define an Elasticsearch Cluster to be referred to in
// Logstash pipeline
type ElasticsearchCluster struct {
commonv1.ObjectSelector `json:",omitempty,inline"`
ClusterName string `json:"clusterName,omitempty"`
}
We could potentially create the env variables with the clusterName? (And if we did would we need the namespace here?)
There is some prior art with the elastic agent using an outputName to distinguish between multiple elasticsearchRefs
.
Alternatively, we could stick with what we have here, and write cluster.yaml
as:
elasticsearch_clusters:
local_production:
host: "https://metaldata-es-http:9200"
username: “logstash_system
password: REDACTED
ca_file: /mnt/elastic-internal/es-certs/ca.crt
using $namespace_$name
as the id.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are awesome suggestions. +1 to follow the prior art of agent to unify the experience. Also, +1 to bring cluster name in technical preview.
I think we can use outputName
instead of clusterName
to further share the experience. For the env variable, let's replace $namespace_$name
with outputName
. The name needs to be unique in elasticsearchRefs
. I will add validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought on using clusterName
over outputName
is that we also use want to use elasticsearchRef
in input
and filter
plugins, not just for outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robbavey Revisiting the usage of clusterName
. I have doubts about its value. One drawback is the stack monitoring should understand clusterName
and send metrics to the corresponding cluster, which does not feasible now because the operator only considers namespace + name in the association.
From the design doc, things we want to achieve (now and future cluster.yml
) with clusterName seem can be replaced by namespace + name.
elasticsearch {
cluster_id => namespace_name
}
elasticsearch_clusters:
namespace_name:
host: "https://metaldata-es-http:9200"
username: “logstash_system
password: REDACTED
ca_file: /mnt/elastic-internal/es-certs/ca.crt
namespace_name2:
cloud_id: XXXXXXX
api_key: yyyyyy
clusterName
works like a nickname that is handy when user wants to give a second name.
I wanna keep the current env variable naming convention, namespace + name, at least in the technical preview.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @barkbay that the lack of clusterName
in elasticsearchRef
in stack monitoring is probably ok - clusterName
is added as an aid for the user to refer to an elasticsearch cluster pointed to by elasticsearchRef
outside of the kubernetes CRD in pipeline configurations, without needing to deal with kubernetes semantics, such as namespaces. The ability to define/move namespaces outside of the CRD is an an interesting twist on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be an optional feature? If users define it it will be used to name the environment variables? Also could it then just be named alias
?
elasticsearchRefs:
- alias: o11y-cluster
name: monitor
namespace: production
I think I also agree that stack monitoring is different enough to allow a special case here. Because users have to understand the relationship between the elasticsearchRef
and what they define in configuration there is a motivation for such a construct. Why would one want to use an alias for stack monitoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Logstash context, clusterName
is a mapping of name to endpoint (host, username, password, cacert). The endpoint can be an input (source) or an output (destination). It will be referenced in pipeline.
input {
kafka { }
}
output {
elasticsearch { cluster_name => "production" }
elasticsearch { cluster_name => "uat" }
}
In Logstash CRD, we allow users to define the name mapping
elasticsearchRefs:
- clusterName: production
name: es1
- clusterName: uat
name:es2
- clusterName: monitor
name:es3
However, Stack Monitoring does not understand the name production
, uat
and monitor
. If Stack Monitoring wants to insert metrics to monitor
, it needs the following
monitoring:
metrics:
elasticsearchRefs:
- name: es3
As a user, I am confused why I can't use monitor
that I declared earlier. The name works in pipeline but not in Stack Monitoring.
IMO, ultimately we will need clusterName
to achieve cluster.yml
which contains the mapping of alias to endpoint. In technical preview, it seems not ready. I hope the example can explain the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my confusion here, is what would be your expectation of using clusterName
in the monitoring
section?
Are you expecting to be able to use clusterName
instead of name
+ namespace
? Or as well as? Is your expectation that you could use
monitoring:
metrics:
elasticsearchRefs:
- clusterName: monitor
or that you would expect
monitoring:
metrics:
elasticsearchRefs:
- clusterName: monitor
name:es3
or that clusterName
is arbitrary without the specific cluster
or clusterName
setting in cluster.yml
when that comes to be?
I agree that it feels a little half baked without the cluster.yml
, and we only refer to clusterName
in the env variable, but I do feel that it will have utility when we start allowing named clusters, and maybe introducing the notion earlier is of some value.
Thoughts @flexitrev, @jsvd, @roaksoax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my expectation is to be able to use clusterName
to output metrics to monitor
monitoring:
metrics:
elasticsearchRefs:
- clusterName: monitor
and won't stop user to use namespace + name to output
monitoring:
metrics:
elasticsearchRefs:
- name: es3
The declaration in spec.elasticsearchRefs
will all convert to cluster.yml
. Only the clusterName
in cluster.yml
can be referenced in pipeline.
elasticsearchRefs:
- clusterName: production
name: es1
- clusterName: uat
name:es2
- clusterName: monitor
name:es3
cluster.yml
- production:
host:...
username:...
password:...
cacert:...
- uat:
host:...
username:...
password:...
cacert:...
- monitor:
host:...
username:...
password:...
cacert:...
…h_esrefs # Conflicts: # pkg/controller/logstash/pod.go
@barkbay I think this is ready for review There is a discussion about bringing |
{ | ||
Names: []string{"logstash-*", "ecs-logstash-*", "logs-*", "metrics-*", "synthetics-*", "traces-*"}, | ||
Privileges: []string{"manage", "read", "create_doc", "view_index_metadata", "create_index"}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need more permissions here, in terms of both index names, and privileges.
I tried a few different simple scenarios, and there were a few issues -
Removing data_streams => false
will write to logstash
, and ecs_compatibility => disabled
will write to ecs-logstash
by default, so we will need to add those to the list of Names
Additionally, I think we need to add additional privileges - we get a
[indices:data/write/index:op_type/index] is unauthorized for user [default-logstash-sample-default-elasticsearch-logstash-user] with effective roles [eck_logstash_user_role] on indices [ecs-logstash], this action is granted by the index privileges [create,index,write,all]"
error when attempting to write to this index. I suspect we will need the write
privilege here - the elasticsearch output is able to do create
, index
, update
and delete
actions
If we are going to limit the indices here, we will need to add instructions on how users can get around this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for double check it. I have updated the setting.
ecs-logstash
and logstash
are required by ilm_rollover_alias
The logstash doc regarding secure connection needs update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This now works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me from a logstash perspective.
My only concern is around the experience when referring to an index that is not one of the "default" ones, and how we choose to document this.
pkg/controller/logstash/env_test.go
Outdated
AuthSecretName: "logstash-sample-default-elasticsearch-sample-logstash-user", | ||
AuthSecretKey: "default-logstash-sample-default-elasticsearch-sample-logstash-user", | ||
CACertProvided: false, | ||
CASecretName: "logstash-sample-logstash-es-default-elasticsearch-sample-ca", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we expect this to be created if we are not using tls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Sure, I'll do a first review today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments, I still have to do some tests.
pkg/controller/logstash/env.go
Outdated
|
||
func getEsRefNamespacedName(assoc commonv1.Association) string { | ||
ref := assoc.AssociationRef() | ||
return fmt.Sprintf("%s_%s", ref.Namespace, ref.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref.Name
may be empty if user provided connection settings through a Secret
:
apiVersion: logstash.k8s.elastic.co/v1alpha1
kind: Logstash
metadata:
name: logstash-sample
spec:
count: 1
version: 8.7.0
elasticsearchRefs:
- secretName: external-cloud-es-ref
pipelines: {}
---
apiVersion: v1
kind: Secret
metadata:
name: external-cloud-es-ref
stringData:
url: https://my-deployment.es.europe-west1.gcp.cloud.es.io
username: elastic
password: REDACTED
This feature, introduced in ECK 2.2.0 is not documented (yet: #6449) but can be used by a user to connect a stack component to an [external/non ECK managed] Elasticsearch cluster.
I'm not sure what is the best way to handle this case if we want to inject connection settings using env. variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great that you raise the issue. We can address the external ES connection in a follow-up. I am thinking of using ref.NameOrSecretName()
for env. In the future, the operator will need to write managed and unmanaged cluster info to a cluster.yml
. Logstash will need to map the cluster.yml
to plugin config.
In the technical preview, we aim for managed clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an issue for followup https://github.com/elastic/ingest-dev/issues/1764
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make it a public issue so our users can find it?
@@ -150,6 +152,22 @@ var ( | |||
}, | |||
}, | |||
}, | |||
LogstashUserRole: esclient.Role{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indices chosen here will restrict logstash to only being able to create the "standard" indices/templates, and not custom ones created by users.
That's a good point. New roles can be added using the roles
section, and even if it is not ideal this can also be used to override predefined roles.
Co-authored-by: Michael Morello <michael.morello@gmail.com>
Co-authored-by: Michael Morello <michael.morello@gmail.com>
…into logstash_esrefs
…to logstash_esrefs # Conflicts: # pkg/controller/logstash/pod.go
- change env variable prefix - add validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great - really close, I have a couple of nits on clarifying the messaging on missing clusterName
Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from the logstash side
@kaisecheng One thing I noticed, that I should have noticed earlier 🤦 - we use |
@robbavey I have changed the suffix to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
👀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// ElasticsearchCluster definition of type to define an Elasticsearch Cluster to be referred to in Logstash pipeline | ||
type ElasticsearchCluster struct { | ||
commonv1.ObjectSelector `json:",omitempty,inline"` | ||
ClusterName string `json:"clusterName,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to prevent user from using a same clusterName
for different clusters with a new validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would like to improve it in future
Co-authored-by: Michael Morello <michael.morello@gmail.com>
Co-authored-by: Michael Morello <michael.morello@gmail.com>
Co-authored-by: Michael Morello <michael.morello@gmail.com>
This PR allows configuration of
elasticsearchRefs
in CRD to associate Elasticsearch clusters to Logstash. The connectivity info/ credentials are passed via environment variable to Logstash container for referencing in Logstash plugins: logstash-output-elasticsearch, logstash-input-elasticsearch and logstash-filter-elasticsearch.Sample config
Environment variables taking
$clusterName_ES_
as prefix are available in Logstash containerPRODUCTION_ES_HOSTS PRODUCTION_ES_USERNAME PRODUCTION_ES_PASSWORD PRODUCTION_ES_SSL_CERTIFICATE_AUTHORITY # path of ca.crt of production cluster
Sample Logstash pipeline with elasticsearch output using environment variables
The sample logstash yml file as located in config/samples/logstash/logstash_es.yaml will
create:
e2e test
Still todo:
Co-authored-by: Rob Bavey rob.bavey@elastic.co