amazonvpc is not working with Ubuntu 22.04(Jammy) #15720

h3poteto · 2023-07-30T12:28:08Z

/kind bug

**1. What kops version are you running? The command kops version, will display

$ kops version
Client version: 1.27.0 (git-v1.27.0)

I tried the same thing with the master branch (a8fa895).

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"archive", BuildDate:"2023-06-15T08:14:06Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:14:49Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

$ kops create -f cluster.yaml
$ kops update cluster --name $CLUSTER_NAME --admin --yes

5. What happened after the commands executed?
A Kubernetes cluster is created, but some pods are not working. So kops validate command fails.
For example, cert-manager and ebs-csi-node say errors.

$ k get pods -n kube-system 
NAME                                     READY   STATUS              RESTARTS        AGE
aws-cloud-controller-manager-hlxjj       1/1     Running             0               14m
aws-cloud-controller-manager-p2ggb       1/1     Running             0               14m
aws-cloud-controller-manager-p5nk6       1/1     Running             0               13m
aws-iam-authenticator-9zz6q              0/1     ContainerCreating   0               13m
aws-iam-authenticator-mnmgq              0/1     ContainerCreating   0               13m
aws-iam-authenticator-pnncn              0/1     ContainerCreating   0               13m
aws-node-6mbll                           1/1     Running             0               12m
aws-node-84kqw                           1/1     Running             0               14m
aws-node-npfd6                           1/1     Running             0               12m
aws-node-qmcjd                           1/1     Running             0               13m
aws-node-x5z6h                           1/1     Running             0               12m
aws-node-xmm5m                           1/1     Running             0               14m
cert-manager-85495b9754-jnrb9            1/1     Running             0               14m
cert-manager-cainjector-879f4679-tgt42   0/1     CrashLoopBackOff    6 (4m11s ago)   14m
cert-manager-webhook-5c5c9f4f95-wxpgh    0/1     CrashLoopBackOff    6 (4m20s ago)   14m
coredns-69998f855-9kkb8                  0/1     Pending             0               14m
coredns-autoscaler-fcf87bf56-hj4cc       0/1     Pending             0               14m
dns-controller-849b6b44c5-xdjl2          1/1     Running             0               14m
ebs-csi-controller-55847c479b-4dbxz      5/5     Running             0               14m
ebs-csi-controller-55847c479b-bhttn      5/5     Running             0               14m
ebs-csi-node-cpnb6                       2/3     CrashLoopBackOff    6 (4m58s ago)   14m
ebs-csi-node-hj55p                       2/3     CrashLoopBackOff    6 (3m29s ago)   12m
ebs-csi-node-p6hwc                       2/3     CrashLoopBackOff    6 (3m16s ago)   12m
ebs-csi-node-sb7sn                       2/3     CrashLoopBackOff    6 (3m32s ago)   12m
ebs-csi-node-wwkxn                       2/3     CrashLoopBackOff    6 (4m22s ago)   13m
ebs-csi-node-xcgjt                       2/3     CrashLoopBackOff    6 (4m57s ago)   14m
kops-controller-4r62d                    1/1     Running             0               14m
kops-controller-5jg69                    1/1     Running             0               13m
kops-controller-wxttx                    1/1     Running             0               14m
kube-apiserver-i-01a0c4dbe2535a577       2/2     Running             3 (15m ago)     13m
kube-apiserver-i-0cf34834c7d869a4c       2/2     Running             3 (15m ago)     13m
kube-apiserver-i-0e010cb5829cbd426       2/2     Running             4 (14m ago)     12m
pod-identity-webhook-7b4747876c-z2kgw    0/1     Pending             0               14m
pod-identity-webhook-7b4747876c-zwbl4    0/1     Pending             0               14m

ebs-csi-node:

$ k logs -f ebs-csi-node-hj55p --previous
Defaulted container "ebs-plugin" out of: ebs-plugin, node-driver-registrar, liveness-probe
I0730 11:38:38.590223       1 node.go:91] regionFromSession Node service
I0730 11:38:38.590349       1 metadata.go:85] retrieving instance data from ec2 metadata
W0730 11:38:44.919462       1 metadata.go:88] ec2 metadata is not available
I0730 11:38:44.919486       1 metadata.go:96] retrieving instance data from kubernetes api
I0730 11:38:44.920033       1 metadata.go:101] kubernetes api is available
panic: error getting Node i-0d639425eec9d7b0b: Get "https://100.64.0.1:443/api/v1/nodes/i-0d639425eec9d7b0b": dial tcp 100.64.0.1:443: i/o timeout

goroutine 1 [running]:
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newNodeService(0xc000638640)
        /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node.go:94 +0x345
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver({0xc00023bf30, 0x8, 0x3684458?})
        /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:95 +0x393
main.main()
        /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:46 +0x37d

cert-manager-webhook:

$ k logs -f cert-manager-webhook-5c5c9f4f95-wxpgh
W0730 11:43:28.030767       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
E0730 11:43:58.034144       1 webhook.go:123] "cert-manager: Failed initialising server" err="error building admission chain: Get \"https://100.64.0.1:443/api\": dial tcp 100.64.0.1:443: i/o timeout"

6. What did you expect to happen?
All pods work fine, and kops validate succeed.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: null
  name: playground.k8s.h3poteto.dev
spec:
  api:
    dns: {}
  authentication:
    aws: {}
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://my-playground-store/playground.k8s.h3poteto.dev
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-ap-northeast-1a
      name: a
    - encryptedVolume: true
      instanceGroup: control-plane-ap-northeast-1c
      name: c
    - encryptedVolume: true
      instanceGroup: control-plane-ap-northeast-1d
      name: d
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-ap-northeast-1a
      name: a
    - encryptedVolume: true
      instanceGroup: control-plane-ap-northeast-1c
      name: c
    - encryptedVolume: true
      instanceGroup: control-plane-ap-northeast-1d
      name: d
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    maxPods: 50
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.27.4
  masterPublicName: api.playground.k8s.h3poteto.dev
  networkCIDR: 172.16.0.0/16
  networkID: vpc-00ea717e1640613ea
  networking:
    amazonvpc: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  serviceAccountIssuerDiscovery:
    discoveryStore: s3://my-irsa-store
    enableAWSOIDCProvider: true
  podIdentityWebhook:
    enabled: true
  certManager:
    enabled: true
    managed: true
  sshAccess: []
  subnets:
  - id: subnet-0619c5276e1edce32
    cidr: 172.16.0.0/20
    name: ap-northeast-1a
    type: Public
    zone: ap-northeast-1a
  - id: subnet-04acc221370b74258
    cidr: 172.16.16.0/20
    name: ap-northeast-1c
    type: Public
    zone: ap-northeast-1c
  - id: subnet-06d07ea961e7e0007
    cidr: 172.16.32.0/20
    name: ap-northeast-1d
    type: Public
    zone: ap-northeast-1d
  topology:
    dns:
      type: Public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: playground.k8s.h3poteto.dev
  name: control-plane-ap-northeast-1a
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230711
  machineType: t3.small
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - ap-northeast-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: playground.k8s.h3poteto.dev
  name: control-plane-ap-northeast-1c
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230711
  machineType: t3.small
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - ap-northeast-1c

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: playground.k8s.h3poteto.dev
  name: control-plane-ap-northeast-1d
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230711
  machineType: t3.small
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - ap-northeast-1d

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: playground.k8s.h3poteto.dev
  name: nodes-ap-northeast-1a
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230711
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Node
  subnets:
  - ap-northeast-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: playground.k8s.h3poteto.dev
  name: nodes-ap-northeast-1c
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230711
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Node
  subnets:
  - ap-northeast-1c

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: playground.k8s.h3poteto.dev
  name: nodes-ap-northeast-1d
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230711
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Node
  subnets:
  - ap-northeast-1d

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?
If I specify cilium as CNI in networking, it works fine (I tried Cilium with ENI, and it also works fine).
If I change the image to Ubuntu-20.04 (I tried 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20211015), it works fine with amazonvpc.

In conclusion, I suspect the combination of amazonvpc and Ubuntu 22.04.

The text was updated successfully, but these errors were encountered:

hakman · 2023-07-30T13:40:56Z

@h3poteto See aws/amazon-vpc-cni-k8s#2103

h3poteto · 2023-07-31T03:02:55Z

Thank you, I got it.
If this issue does not need to be tracked in kOps issue, please close this issue.

hakman · 2023-07-31T06:04:26Z

Let's keep it open for some time.

colt-1 · 2023-10-01T19:47:08Z

This still seems to be happening with: ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230919

hakman · 2023-10-02T00:44:04Z

@colathro this will continue happening until someone from AWS fixes aws/amazon-vpc-cni-k8s#2103.

pmankad96 · 2023-10-13T00:05:59Z

It also prevents new kops cluster with networking=amazonvpc to come up healthy. In my case core-dns-xx and ebs-csi-node pods kept crashing. For core-dns the log read: plugin/error timeout when trying to connect to Amazon Provided DNS server. For ebs-csi-node the error was related to unable to get the Node (was trying on 100.64. - not sure why). The workaround is to use 20.04 image instead. The error messages are so cryptic that it took me a while to figure out.

btalbot · 2023-10-19T00:13:28Z

I ran into this as well while upgrading a test cluster from kubernetes 1.26.5 to 1.27.6 using kops 1.28.

The error from ebs-plugin container of the ebs-csi-node running on Ubuntu 22.04 is shown below. Reverting the node images to Ubuntu 20.04 (ubuntu-focal-20.04-amd64-server-20230502) allowed the rolling-restart with --cloudonly to cleanly restart the affected control-plane nodes.

+ kube-system ebs-csi-node-sjkv7 › ebs-plugin
kube-system ebs-csi-node-sjkv7 ebs-plugin I1018 23:46:11.891261       1 metadata.go:101] kubernetes api is available
kube-system ebs-csi-node-sjkv7 ebs-plugin panic: error getting Node i-04bddcf2fcb369bae: Get "https://100.64.0.1:443/api/v1/nodes/i-04bddcf2fcb369bae": dial tcp 100.64.0.1:443: i/o timeout
kube-system ebs-csi-node-sjkv7 ebs-plugin
kube-system ebs-csi-node-sjkv7 ebs-plugin goroutine 1 [running]:
kube-system ebs-csi-node-sjkv7 ebs-plugin github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newNodeService(0xc00003f540)
kube-system ebs-csi-node-sjkv7 ebs-plugin 	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node.go:94 +0x345
kube-system ebs-csi-node-sjkv7 ebs-plugin github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver({0xc00054df30, 0x8, 0x3684458?})
kube-system ebs-csi-node-sjkv7 ebs-plugin 	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:95 +0x393
kube-system ebs-csi-node-sjkv7 ebs-plugin main.main()
kube-system ebs-csi-node-sjkv7 ebs-plugin 	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:46 +0x37d
- kube-system ebs-csi-node-sjkv7 › ebs-plugin

btalbot · 2023-10-19T00:15:45Z

Can't kops work around this issue by simply NOT updating to Ubuntu 22.04 for instances running in AWS? Seems silly to keep breaking everyone's clusters like this.

hakman · 2023-10-19T06:44:20Z

Can't kops work around this issue by simply NOT updating to Ubuntu 22.04 for instances running in AWS? Seems silly to keep breaking everyone's clusters like this.

kOps is not just about clusters using AWS VPC CNI. All other CNIs and components work fine with Ubuntu 22.04.
Ubuntu 22.04 is 1.5 years old and AWS did not add support for it, with no plan to do so in the near future.

Probably it is a good idea to add something that locks clusters with AWS VPC CNI to Ubuntu 20.04.

doryer · 2023-11-01T14:36:14Z

Can't kops work around this issue by simply NOT updating to Ubuntu 22.04 for instances running in AWS? Seems silly to keep breaking everyone's clusters like this.

kOps is not just about clusters using AWS VPC CNI. All other CNIs and components work fine with Ubuntu 22.04. Ubuntu 22.04 is 1.5 years old and AWS did not add support for it, with no plan to do so in the near future.

Probably it is a good idea to add something that locks clusters with AWS VPC CNI to Ubuntu 20.04.

It worth marking it as un-stable (https://github.com/kubernetes/kops/blob/master/docs/operations/images.md) as we tried to upgrade ubuntu version and faced issues in the cluster

k8s-triage-robot · 2024-01-31T14:42:12Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

h3poteto · 2024-01-31T15:05:31Z

/remove-lifecycle stale

hakman · 2024-01-31T15:14:46Z

/cc @moshevayner

moshevayner · 2024-01-31T15:17:04Z

This is related to #16255
I'm working on a fix, hopefully will have a PR up in the next couple of days 🙏🏼🙏🏼

moshevayner · 2024-01-31T15:17:47Z

/assign

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 30, 2023

rifelpet mentioned this issue Nov 8, 2023

Document incompatability with Amazon VPC CNI and 22.04 #16083

Merged

rifelpet mentioned this issue Nov 30, 2023

[kops] Pin cilium-eni jobs to Ubuntu 20.04 kubernetes/test-infra#31346

Merged

Deshke mentioned this issue Jan 16, 2024

AWS VPC CNI Ubuntu 22.04 MACAddressPolicy #16255

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 31, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 31, 2024

k8s-ci-robot assigned moshevayner Jan 31, 2024

moshevayner mentioned this issue Feb 2, 2024

fix(nodeup): set MACAddressPolicy=none when using AWS VPC CNI #16313

Merged

k8s-ci-robot closed this as completed in #16313 Feb 3, 2024

moshevayner mentioned this issue Feb 5, 2024

docs: Remove warning about Amazon VPC CNI not being compatible with Ubuntu 22.04 #16326

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

amazonvpc is not working with Ubuntu 22.04(Jammy) #15720

amazonvpc is not working with Ubuntu 22.04(Jammy) #15720

h3poteto commented Jul 30, 2023

hakman commented Jul 30, 2023

h3poteto commented Jul 31, 2023

hakman commented Jul 31, 2023

colt-1 commented Oct 1, 2023

hakman commented Oct 2, 2023

pmankad96 commented Oct 13, 2023 •

edited

Loading

btalbot commented Oct 19, 2023

btalbot commented Oct 19, 2023

hakman commented Oct 19, 2023

doryer commented Nov 1, 2023

k8s-triage-robot commented Jan 31, 2024

h3poteto commented Jan 31, 2024

hakman commented Jan 31, 2024

moshevayner commented Jan 31, 2024

moshevayner commented Jan 31, 2024

amazonvpc is not working with Ubuntu 22.04(Jammy) #15720

amazonvpc is not working with Ubuntu 22.04(Jammy) #15720

Comments

h3poteto commented Jul 30, 2023

hakman commented Jul 30, 2023

h3poteto commented Jul 31, 2023

hakman commented Jul 31, 2023

colt-1 commented Oct 1, 2023

hakman commented Oct 2, 2023

pmankad96 commented Oct 13, 2023 • edited Loading

btalbot commented Oct 19, 2023

btalbot commented Oct 19, 2023

hakman commented Oct 19, 2023

doryer commented Nov 1, 2023

k8s-triage-robot commented Jan 31, 2024

h3poteto commented Jan 31, 2024

hakman commented Jan 31, 2024

moshevayner commented Jan 31, 2024

moshevayner commented Jan 31, 2024

pmankad96 commented Oct 13, 2023 •

edited

Loading