Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Existing pod network not cleanedup when using security groups with stateful sets #1374

Closed
hintofbasil opened this issue Feb 5, 2021 · 13 comments
Assignees
Labels

Comments

@hintofbasil
Copy link

hintofbasil commented Feb 5, 2021

What happened:
When using security groups with stateful sets we noticed that pods often lost connectivity when restarted.
The security group they were bound to allowed all connections inbound and outbound on 0.0.0.0/0.

After some investigation we discovered the bug seems to affect pods re-created on the same node with the same name.

Attach logs
eks_i-08aff468a2d6ce527_2021-02-05_1421-UTC_0.6.2.tar.gz

What you expected to happen:

The pod should launch normally

How to reproduce it (as minimally and precisely as possible):

Create a securityGroupPolicy

apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
  name: cni-test
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: cni-test
  securityGroups:
    groupIds:
      - <id>

Create a pod which uses the security group policy

kind: Pod
metadata:
  name: cni-test
  namespace: default
  labels:
    app: cni-test
spec:
  containers:
    - name: alpine
      image: alpine
      command:
      - sleep
      - "1000000000"

Kill the pod then recreate the pod ensuring it is scheduled to the same node. Then attempt to make an outbound connection from the pod

apk add curl
curl 1.1.1.1

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-12T01:09:16Z", GoVersion:"go1.15.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
  • CNI Version
v1.7.5-eksbuild.1
  • OS (e.g: cat /etc/os-release):
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
  • Kernel (e.g. uname -a):
Linux ip-172-16-214-206.eu-central-1.compute.internal 4.14.209-160.339.amzn2.x86_64 #1 SMP Wed Dec 16 22:44:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
@SaranBalaji90
Copy link
Contributor

SaranBalaji90 commented Feb 5, 2021

I'm trying to repro the issue but so far no luck. Will retry few more times. Can you let me know the name of the pod affected in the logs attached (I couldn't find cni-test pod in logs), I can dig through the logs. If its happening consistently on your cluster, we can schedule a call to dig further into this issue (you can reach me at srajakum@amazon.com).

srajakum@147dda5e4851 yaml-files % kubectl get pods -owide          
NAME                                READY   STATUS              RESTARTS   AGE     IP                NODE                                           NOMINATED NODE   READINESS GATES
cni-test                            1/1     Running             0          6s      192.168.160.222   ip-192-168-65-167.us-west-2.compute.internal   <none>           <none>

srajakum@147dda5e4851 yaml-files % kubectl describe pod cni-test    
Annotations:  kubernetes.io/psp: eks.privileged
              vpc.amazonaws.com/pod-eni:
                [{"eniId":"eni-0529ea213a202db76","ifAddress":"06:03:de:64:a3:05","privateIp":"192.168.160.222","vlanId":1,"subnetCidr":"192.168.160.0/19"...

srajakum@147dda5e4851 yaml-files % kubectl exec -it cni-test /bin/sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # apk add curl
...
/ # curl 1.1.1.1
<html>

<!--- recreated the pod !-->

srajakum@147dda5e4851 yaml-files % kubectl get pods -owide          
NAME                                READY   STATUS              RESTARTS   AGE     IP                NODE                                           NOMINATED NODE   READINESS GATES
cni-test                            1/1     Running             0          6s      192.168.176.242   ip-192-168-65-167.us-west-2.compute.internal   <none>           <none>

Annotation on the pod:
    vpc.amazonaws.com/pod-eni: '[{"eniId":"eni-0cce00f448a476759","ifAddress":"06:17:57:b6:5b:79","privateIp":"192.168.176.242","vlanId":2,"subnetCidr":"192.168.160.0/19"}]'

able to curl again

@SaranBalaji90
Copy link
Contributor

SaranBalaji90 commented Feb 5, 2021

We should probably add the unique ID in the annotation as well and return the details of ENI from ipamd based on unique ID. This will ensure even when kubelet invokes delete after network is removed for old pod, we won't delete the new pod network. (AddNetwork and DelNetwork). Created issue here as well - aws/amazon-vpc-resource-controller-k8s#19 for enhancing this functionality.

@hintofbasil
Copy link
Author

The pod was called prometheus-scaling-0 I believe. It was definitely a pod starting prometheus-.

It seems that updating the CNI to v1.7.8 fixes the issue.

@SaranBalaji90
Copy link
Contributor

SaranBalaji90 commented Feb 5, 2021

Thanks for the info @hintofbasil. May be this pod - prometheus-scaling-1 looks like sequence is right too but let me know if you notice the issue again, I will be happy to jump on a call and assist with the issue. 1.7.8 has fix for pod deletion path so not sure if that's helping here.

@hintofbasil
Copy link
Author

Yes. That would be the one. Should have written it down earlier.
We will keep an eye on it and see if it crops up again. For now we are going to update our clusters to use 1.7.8.

Thanks Sri

@SaranBalaji90
Copy link
Contributor

SaranBalaji90 commented Feb 5, 2021

@hintofbasil can you ensure you have terminationGracePeriodSeconds set on your yaml? Because for pods using security group we describe pods during deletion and if terminationGracePeriodSeconds is not set then pods data will get removed from Kubernetes datastore (etcd) and cni plugin will have dangling records in ip rule which will affect pod network.

@hintofbasil
Copy link
Author

@sri, we do not. We only set terminationGracePeriodSeconds.
We can try setting that value and using the 1.7.5 CNI. An experiment for Monday.

@SaranBalaji90
Copy link
Contributor

@hintofbasil sorry I meant terminationGracePeriodSeconds (the one you mentioned) fixed my comment.

@SaranBalaji90
Copy link
Contributor

@hintofbasil I have created PR to clean up network even if pods are force deleted by the controllers. This will help with network issues you noticed with new pods.

@hintofbasil
Copy link
Author

Hi Sri,

It seems we were a bit early to announce that 1.7.8 fixed the issue. Unfortunately we are still seeing it.

I've even installed a version built from master (99ecb4c).
It now occurs always after a stateful set pod is scheduled onto the same node. However it seems to have fixed the issue with pods of the same name.

I've attached further logs from the master branch version. This time the failing pod is prometheus-prometheus-operator-prometheus-0.

eks_i-06368ccddf952df32_2021-02-08_1603-UTC_0.6.2.tar.gz

@SaranBalaji90
Copy link
Contributor

SaranBalaji90 commented Feb 8, 2021

@hintofbasil our next release which will be this week, will clean up dangling rules which was blocking pod traffic. This occurs when pod is deleted from K8s datastore(etcd) even before CNI is able to read the pod information during deletion (to read annotation). Fix I mentioned above (which is merged to master and 1.7 branch) will take care of cleaning up all dangling rules. This will be prevented once when we have #kubernetes/kubernetes#69882. Even this might help to some level to avoid the race condition - kubernetes/kubernetes#88543

Regarding prometheus-prometheus-operator-prometheus-0, I see that pod network is setup properly. Can you send me your cluster arn to srajakum@amazon.com to investigate further.

@SaranBalaji90 SaranBalaji90 changed the title Incorrect ENI when using security groups with stateful sets Existing pod network not cleanedup when using security groups with stateful sets Feb 8, 2021
@SaranBalaji90
Copy link
Contributor

SaranBalaji90 commented Feb 9, 2021

Local store support for pods using security group - #1313. This will mitigate invoking APIServer on the deletion path instead use local file to read the vlan associated.

@SaranBalaji90
Copy link
Contributor

Closing this as we are tracking the issue using #1313 and our https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html is updated to include terminationPeriodInSeconds on pod spec to avoid deleting the pod objects from etcd before network is cleanedup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants