Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network Policy Agent - Initial commit #1

Closed
wants to merge 2 commits into from
Closed

Network Policy Agent - Initial commit #1

wants to merge 2 commits into from

Conversation

achevuru
Copy link
Contributor

@achevuru achevuru commented Jul 6, 2023

Issue #, if available:
Initial commit

Description of changes:
Initial commit for AWS Network Policy agent

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

yum install -y coreutils kernel-devel elfutils-libelf-devel zlib-devel bpftool libbpf-devel && \
yum clean all

COPY Makefile ./

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COPY Makefile ./ seem redundant with the COPY . ./

COPY . ./
RUN make build-bpf

# Use distroless as minimal base image to package the manager binary

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inaccurate comments

# Image URL to use all building/pushing image targets
IMAGE ?= amazon/aws-network-policy-agent
VERSION ?= $(shell git describe --tags --always --dirty || echo "unknown")
IMAGE_NAME = $(IMAGE)$(IMAGE_ARCH_SUFFIX):$(VERSION)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the IMAGE_ARCH_SUFFIX defined?

@@ -0,0 +1,139 @@
/*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems redundant definition with the network policy controller, i assume we'll switch to depends on the controller's crd once it's published.


const (
envLocalConntrackCacheCleanupPeriod = "CONNTRACK_CACHE_CLEANUP_PERIOD"
defaultLocalConntrackCacheCleanupPeriod = 300
Copy link

@M00nF1sh M00nF1sh Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: better to name the variable xxxUnit e.g. defaultLocalConntrackCacheCleanupPeriodSeconds if it's numeric value without unit.


// PolicyEndpointsReconciler reconciles a PolicyEndpoints object
type PolicyEndpointsReconciler struct {
K8sClient client.Client
Copy link

@M00nF1sh M00nF1sh Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

  1. the usage of Public & Private members seems arbitrary, members like K8sClient /Scheme don't need to be public
  2. the class PolicyEndpointsReconciler itself don't need to be a public as well. (only NewPolicyEndpointsReconciler need to be public)


r.nodeIP, err = imds.GetNodeAddress(enableIPv6)
if err != nil {
log.Error(err, "Unable to derive Node IP. Abort....")
Copy link

@M00nF1sh M00nF1sh Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should return error here and skip the rest processing when error encountered
otherwise, this err might be overwrite to be nil and became ignored by the r.ebpfClient, err = ebpf.NewBpfClient

}

conntrackTTL := r.getLocalConntrackCacheCleanupPeriod()
r.ebpfClient, err = ebpf.NewBpfClient(&r.policyEndpointeBPFContext, &r.IngressProgPodMap,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this component's functionality apparently goes beyond a simple "bpf's client". It seems more like a eBPFNetworkEnforcer /EBPFNetworkPolicyManager/networkPolicyEbpfController (i'm bad at naming as well)

func (r *PolicyEndpointsReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
r.Log.Info("Received a new reconcile request", "req", req)
if err := r.reconcile(ctx, req); err != nil {
r.Log.Info("Reconcile error - requeue the request", "err", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: better to be r.Log.Error, this comment applies to other similar error logs

func (r *PolicyEndpointsReconciler) reconcile(ctx context.Context, req ctrl.Request) error {
policyEndpoint := &policyk8sawsv1.PolicyEndpoint{}
if err := r.K8sClient.Get(ctx, req.NamespacedName, policyEndpoint); err != nil {
if err = client.IgnoreNotFound(err); err == nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: better to use apierror.IsNotFoundError instead of client.IgnoreNotFound(err) when check whether error is not found. client.IgnoreNotFound(err) should be used with return statements only.

return r.cleanUpPolicyEndpoint(ctx, req)
}
r.Log.Info("Unable to get policy endpoint spec", "policyendpoint", req.NamespacedName, "error", err)
return client.IgnoreNotFound(err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the IgnoreNotFound calls seems unnecessary given it's already tested against not found.

}

func (r *PolicyEndpointsReconciler) cleanUpPolicyEndpoint(ctx context.Context, req ctrl.Request) error {
r.Log.Info("Clean Up PolicyEndpoint resources for", "name:", req.NamespacedName.Name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logs here seems need to log both name and namespace

req.NamespacedName.Namespace)

start := time.Now()
duration := msSince(start)
Copy link

@M00nF1sh M00nF1sh Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems this line is in wrong place, the duration here will always close to zero?

duration := msSince(start)

if targetPods, ok := r.policyEndpointSelectorMap.Load(policyEndpointIdentifier); ok {
err := r.updatePods(ctx, req, targetPods.([]types.NamespacedName))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: better to use

if err := ...; err != nil {
     ....
}

return nil
}

func (r *PolicyEndpointsReconciler) updatePods(ctx context.Context, req ctrl.Request,
Copy link

@M00nF1sh M00nF1sh Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's have some comments for this function.

nit: pick a more meaningful name than updatePods if possible. (e.g. updateEbpfProbesForPods)

// 3. If there are no more active policies against this pod. Detach and delete the probes and
// corresponding BPF maps. We will also clean up eBPF pin paths under BPF FS.
for _, targetPod := range targetPods {
r.Log.Info("Updating Pod: ", "Name: ", targetPod.Name, "Namespace: ", targetPod.Namespace)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are using structured logging, we don't need the : in field names.
this comment applies to other logging statements as well.

for _, targetPod := range targetPods {
r.Log.Info("Updating Pod: ", "Name: ", targetPod.Name, "Namespace: ", targetPod.Namespace)
podIdentifier := utils.GetPodIdentifier(targetPod.Name, targetPod.Namespace)
err = r.cleanupeBPFProbes(ctx, targetPod, podIdentifier, req.NamespacedName.Name)
Copy link

@M00nF1sh M00nF1sh Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the input to this function seems weird to me

  1. it accepts the same namespacedName of pod twice(targetPod/podIdentifier) in different format. why not put the podIdentifier calculation inside cleanupeBPFProbes?
  2. why only name of policyendpoint is passed in? (the namespace is not passed)

(disclaimer: i haven't read following code yet)

err = r.cleanupeBPFProbes(ctx, targetPod, podIdentifier, req.NamespacedName.Name)
if err != nil {
r.Log.Info("Cleanup/Update unsuccessful for Pod ", "Name: ", targetPod.Name, "Namespace: ", targetPod.Namespace)
// we don't want to return an error right away but instead attempt to clean up all the pods

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// in the list before returning
}
}
return err

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this don't work. imaging there is two pod, cleanup for first pod failed, and cleanup for second pod succeed, but we are returning nil instead of error here.

consider use errors.Aggregate here:
https://github.com/kubernetes/apimachinery/blob/master/pkg/util/errors/errors.go

r.Log.Info("Processing: ", "Policy Endpoint Identifer: ", policyEndpointIdentifier)

start := time.Now()
duration := msSince(start)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duration is 0....seems meaningless.

r.Log.Info("Error updating the older pods") //TODO - we need to continue and not bail out.retry?
}

for podIdentifier, _ := range podIdentifiers {
Copy link

@M00nF1sh M00nF1sh Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem the bool value of podIdentifiers map is not used? if so, why return a map in the first place..

for podIdentifier, _ := range podIdentifiers {
// Derive Ingress IPs from the PolicyEndpoint
ingressRules, egressRules, isIngressIsolated, isEgressIsolated, err := r.deriveIngressAndEgressFirewallRules(ctx, policyEndpoint, podIdentifier,
policyEndpoint.ObjectMeta.Namespace)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use policyEndpoint.Namespace directly


for podIdentifier, _ := range podIdentifiers {
// Derive Ingress IPs from the PolicyEndpoint
ingressRules, egressRules, isIngressIsolated, isEgressIsolated, err := r.deriveIngressAndEgressFirewallRules(ctx, policyEndpoint, podIdentifier,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, this function interface is werid, you passed both policyEndpoint, policyEndpoint.Namespace to it. unless there are cases where a different namespace than policyEndpoint.Namespace will be passed, it's better to get namespace from policyEndpoint within deriveIngressAndEgressFirewallRules

return nil
}

func (r *PolicyEndpointsReconciler) configureeBPFProbes(ctx context.Context, podIdentifier string,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mixed usage of podIdentifier and types.NamespacedName really hurts readability a lot...
can we stick to one of them in all controller code to present pod/policyendpoint's id? e.g. you should be able to use types.NamespacedName as key to sync.map.

@achevuru achevuru marked this pull request as draft July 18, 2023 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants