Skip to content
This repository has been archived by the owner on Jul 20, 2024. It is now read-only.

Instance stuck if ENI wasn't attached properly #57

Open
LiranV opened this issue Jan 4, 2023 · 1 comment
Open

Instance stuck if ENI wasn't attached properly #57

LiranV opened this issue Jan 4, 2023 · 1 comment

Comments

@LiranV
Copy link

LiranV commented Jan 4, 2023

Hello,
I've encountered the following issue:

  1. The NAT ec2 instance needs to be replaced due to failure or spot termination.
  2. The original instance is removed and the ASG is spawning a new one.
  3. In the meantime the ENI that was used by the instance is still not available for reattachment.
  4. The new instance starts but fails to attach the ENI and gets stuck in a loop while not forwarding traffic.

This happens because the aws ec2 attach-network-interface command in the runonce.sh script to fails, but it still moves on to starting the snat service.

In the snat.sh script (ran by the snat.service) we have the following loop:

while ! ip link show dev eth1; do
  sleep 1
done

Which will run forever as the eth1 interface will never be available.

Possible solutions:

  1. Add a check after aws ec2 attach-network-interface to see that the interface was actually attached (or check return code), if not, fail somehow.
  2. Make it so the loop won't run forever so an additional script can be added by the users of the module to detect this and handle this however they see fit.
@hnryjms
Copy link

hnryjms commented Jan 14, 2024

Can we just terminate the instance if the aws ec2 attach-network-interface command fails? Presumably the ENI will free up after a minute or two, and the second or third EC2 box launched by the Auto-Scaling Group would succeed in attaching the ENI.

Edit: PR #72 seems pretty good also .. how come it's not merged?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants