Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-773 Copy certificates across namespaces #326

Merged
merged 3 commits into from
May 31, 2023

Conversation

jotak
Copy link
Member

@jotak jotak commented Apr 17, 2023

  • Any certificate (secret/cm) can now be referenced from any namespace, which makes the operator watch the original and copy it to a target namespace. It allows not only to deploy Loki (or Kafka) in any namespace, but also fixes the issue of ebpf pods not having access to kafka CA/key without manual intervention

And quite a big refactoring:

  • New "watchers" and "volumes" packages
  • Creation of volumes is now using a builder-style approach allowing to incrementally add volumes and get at the same time their path. It avoid having discrepancies between mounted volumes and their related path reference.
  • Watching certificates (or any CM/secret) now uses digest hash of content instead of metadata, to avoid triggering pods restart when a cm/secret was changed despite its content remaining the same
  • Some things are moved around to make internal APIs easier to use, less parameters in functions, etc.
  • New extensive integration tests on certificates management

@codecov
Copy link

codecov bot commented Apr 17, 2023

Codecov Report

Merging #326 (ba51186) into main (98372de) will increase coverage by 2.02%.
The diff coverage is 77.00%.

@@            Coverage Diff             @@
##             main     #326      +/-   ##
==========================================
+ Coverage   51.42%   53.45%   +2.02%     
==========================================
  Files          43       45       +2     
  Lines        5186     5328     +142     
==========================================
+ Hits         2667     2848     +181     
+ Misses       2323     2279      -44     
- Partials      196      201       +5     
Flag Coverage Δ
unittests 53.45% <77.00%> (+2.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
api/v1alpha1/flowcollector_types.go 100.00% <ø> (ø)
api/v1alpha1/zz_generated.conversion.go 0.26% <0.00%> (-0.01%) ⬇️
api/v1beta1/flowcollector_types.go 100.00% <ø> (ø)
controllers/ovs/flowsconfig_ovnk_reconciler.go 0.00% <0.00%> (ø)
pkg/helper/flowcollector.go 66.17% <0.00%> (-6.41%) ⬇️
pkg/helper/client_helper.go 53.06% <53.06%> (ø)
controllers/flowcollector_controller.go 57.26% <65.51%> (+3.35%) ⬆️
...trollers/consoleplugin/consoleplugin_reconciler.go 60.96% <67.50%> (-2.03%) ⬇️
...ntrollers/ebpf/internal/permissions/permissions.go 46.62% <68.18%> (+1.11%) ⬆️
...rollers/flowlogspipeline/flp_transfo_reconciler.go 59.15% <70.45%> (-2.39%) ⬇️
... and 17 more

... and 2 files with indirect coverage changes

OlivierCazade
OlivierCazade previously approved these changes May 9, 2023
Copy link
Contributor

@OlivierCazade OlivierCazade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@memodi
Copy link
Contributor

memodi commented May 17, 2023

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 17, 2023
- Any certificate (secret/cm) can now be referenced from any namespace,
  which makes the operator watch the original and copy it to a target
namespace. It allows not only to deploy Loki (or Kafka) in any
namespace, but also fixes the issue of ebpf pods not having access to
kafka CA/key without manual intervention

And quite a big refactoring:
- New "watchers" and "volumes" packages
- Creation of volumes is now using a builder-style approach allowing to
  incrementally add volumes and get at the same time their path. It
avoid having discrepancies between mounted volumes and their related
path reference.
- Watching certificates (or any CM/secret) now uses digest hash of
  content instead of metadata, to avoid triggering pods restart when a
cm/secret was changed despite its content remaining the same
- Some things are moved around to make internal APIs easier to use, less
  parameters in functions, etc.
- New extensive integration tests on certificates management

Fix failing tests
@openshift-ci
Copy link

openshift-ci bot commented May 23, 2023

New changes are detected. LGTM label has been removed.

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 23, 2023
@memodi
Copy link
Contributor

memodi commented May 23, 2023

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 23, 2023
@jotak jotak added ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. and removed ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. labels May 24, 2023
@github-actions
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:338d168
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-338d168
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-338d168

They will expire after two weeks.

Catalog source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-338d168
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

Copy link
Contributor

@memodi memodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jotak - I think I am hitting a bug w/ KAFKA scenario, where I have both of them in different NS and FLP pods are getting 403 error:

time=2023-05-25T20:34:13Z level=info component=client error=server returned HTTP status 403 Forbidden (403):  fields.level=error fields.msg=final error sending batch host=lokistack-gateway-http.netobserv-loki.svc.cluster.local:8080 module=export/loki status=403

here's loki and kafka config:

{
  "authToken": "FORWARD",
  "batchSize": 10485760,
  "batchWait": "1s",
  "maxBackoff": "5s",
  "maxRetries": 2,
  "minBackoff": "1s",
  "staticLabels": {
    "app": "netobserv-flowcollector"
  },
  "statusTls": {
    "caCert": {
      "namespace": ""
    },
    "enable": false,
    "insecureSkipVerify": false,
    "userCert": {
      "namespace": ""
    }
  },
  "tenantID": "netobserv",
  "timeout": "10s",
  "tls": {
    "caCert": {
      "certFile": "service-ca.crt",
      "name": "lokistack-gateway-ca-bundle",
      "namespace": "netobserv-loki",
      "type": "configmap"
    },
    "enable": true,
    "insecureSkipVerify": false,
    "userCert": {
      "namespace": "netobserv-loki"
    }
  },
  "url": "https://lokistack-gateway-http.netobserv-loki.svc.cluster.local:8080/api/logs/v1/network/"
}

{
  "address": "kafka-cluster-kafka-bootstrap.netobserv-kafka",
  "tls": {
    "caCert": {
      "certFile": "ca.crt",
      "name": "kafka-cluster-cluster-ca-cert",
      "namespace": "netobserv-kafka",
      "type": "secret"
    },
    "enable": true,
    "insecureSkipVerify": false,
    "userCert": {
      "certFile": "user.crt",
      "certKey": "user.key",
      "name": "flp-kafka",
      "namespace": "netobserv-kafka",
      "type": "secret"
    }
  },
  "topic": "network-flows"
}

perhaps we're using incorrect certs to connect with Loki in this scenario?

@jotak
Copy link
Member Author

jotak commented May 26, 2023

@memodi I don't remember if I tested loki AND kafka in different namespaces, so let me get a try. Just in case: you did deploy the clusterrolebinding for reading loki logs just as usual? ie. FLP's service account has permissions to write logs?

@memodi
Copy link
Contributor

memodi commented May 26, 2023

@memodi I don't remember if I tested loki AND kafka in different namespaces, so let me get a try. Just in case: you did deploy the clusterrolebinding for reading loki logs just as usual? ie. FLP's service account has permissions to write logs?

ah, that might be it. I didn't update the CRD to work with KAFKA mode, my bad.

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 26, 2023
@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 26, 2023
@jotak jotak added ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. and removed ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. labels May 26, 2023
@github-actions
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:2d6ecc1
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-2d6ecc1
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-2d6ecc1

They will expire after two weeks.

Catalog source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-2d6ecc1
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 26, 2023
@memodi
Copy link
Contributor

memodi commented May 26, 2023

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 26, 2023
@github-actions
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:ce1b60c
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-ce1b60c
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-ce1b60c

They will expire after two weeks.

Catalog source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-ce1b60c
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@memodi
Copy link
Contributor

memodi commented May 30, 2023

Although I did not find issues for changes in this PR, I came across NETOBSERV-1070 bug for FLP metrics server.

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved QE has approved this pull request label May 30, 2023
@jotak
Copy link
Member Author

jotak commented May 30, 2023

thanks @memodi !
/approve

@openshift-ci
Copy link

openshift-ci bot commented May 30, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jotak
Copy link
Member Author

jotak commented May 31, 2023

(merging .. was previously reviewed before last rebase)

@jotak jotak merged commit ca61fad into netobserv:main May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. qe-approved QE has approved this pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants