Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retaining the cluster is causing containers to keep the port busy #20

Closed
ivanrodjr opened this issue Mar 20, 2018 · 6 comments
Closed
Assignees
Labels

Comments

@ivanrodjr
Copy link

If cluster is retained, after trying to do another test run, this happens:

lucy_1 | Minions at 172.31.36.28,172.31.30.216,172.31.35.238,172.31.18.154,172.31.43.111,172.31.46.132,172.31.33.178,172.31.30.205,172.31.37.225,172.31.43.145,172.31.43.198,172.31.30.29,172.31.11.207,172.31.32.122,172.31.2.35,172.31.14.72,172.31.9.104,172.31.15.244,172.31.15.123,172.31.1.211,172.31.5.13,172.31.36.53,172.31.41.171,10.74.1.22,10.74.1.119,172.31.32.24,172.31.45.162,172.31.37.187,172.31.41.38,172.31.34.139,172.31.24.30,172.31.9.165,172.31.36.205,172.31.38.45,10.0.1.177,172.31.41.12,172.31.12.89,172.31.32.49,172.31.0.239,172.31.34.248,None,172.31.25.152,172.31.46.203,172.31.5.71,172.31.24.167,172.31.24.135,172.31.33.56,172.31.33.17,172.31.44.126,172.31.46.158,172.31.5.203,172.31.33.136,172.31.37.190,172.31.33.176,172.31.28.80,172.31.1.166,172.31.11.91,172.31.45.134,None,172.31.4.6,172.31.3.89,172.31.28.126,172.31.37.114,172.31.15.153,172.31.10.180,172.31.44.44,172.31.41.248,172.31.35.119,172.31.41.46,10.0.0.121,172.31.35.74,172.31.22.195,172.31.32.129,172.31.38.115,172.31.36.18,172.31.33.150,172.31.26.188,172.31.39.6,172.31.0.110,172.31.40.125,172.31.44.231,10.0.0.180,172.31.2.177,172.31.40.134,172.31.15.182,172.31.36.44,172.31.46.174,172.31.16.199,172.31.42.191,None,172.31.39.72,None,172.31.40.202,172.31.7.248,172.31.33.53,172.31.16.193,172.31.0.209,172.31.34.62,None,172.31.43.154,172.31.38.100,172.31.9.249,172.31.10.0,172.31.43.78,172.31.24.144,172.31.31.235,172.31.37.111,172.31.24.85,172.31.42.17,172.31.16.131,172.31.40.163,172.31.14.237,172.31.46.173,172.31.32.112,None,172.31.4.1,172.31.39.60,172.31.15.206,172.31.40.59,172.31.2.192,172.31.10.41,10.0.1.107,172.31.46.185,172.31.3.93,
lucy_1 | Copying /plans/demo.jmx to Gru
lucy_1 | ssh: Could not resolve hostname 54.233.252.21754.207.105.77: Name does not resolve
lucy_1 | lost connection
lucy_1 | Running Docker to start JMeter in Gru mode
lucy_1 | ssh: Could not resolve hostname 54.233.252.21754.207.105.77: Name does not resolve
lucy_1 | Copying results from Gru
lucy_1 | ssh: Could not resolve hostname 54.233.252.21754.207.105.77: Name does not resolve
lucy_1 | cluster/JMeter is retained upon request.

or:

docker: Error response from daemon: driver failed programming external connectivity on endpoint lucid_clarke (355735cf1d09ee6038688d2e5f61b2ede2680f76b8d4d7c9e05561d2518f6711): Bind for 0.0.0.0:51000 failed: port is already allocated

Also, # - RETAIN_CLUSTER=false had to comment this line or remove it in order to get the cluster deleted, probably another issue altogether.

@dsperling
Copy link
Member

A MINION_COUNT of 125. Impressive.

We have seen this issue infrequently at lower minion counts, but it appears to happen frequently with larger counts.

The method of determining the Gru instance from the minions is problematic as it requires that all 125 instances (in your case) to have a runningTaskCount of 1:

jmeter-ecs/lucy/lucy.sh

Lines 111 to 117 in 068cfbb

GRU_INSTANCE_ID=$(aws ecs describe-container-instances --cluster $CLUSTER_NAME \
--container-instances $CONTAINER_INSTANCE_IDS --query 'containerInstances[*].[ec2InstanceId,runningTasksCount]' --output text | grep '\t0' | awk '{print $1}')
echo "Gru instance ID: $GRU_INSTANCE_ID"
MINION_INSTANCE_IDS=$(aws ecs describe-container-instances --cluster $CLUSTER_NAME \
--container-instances $CONTAINER_INSTANCE_IDS --query 'containerInstances[*].[ec2InstanceId,runningTasksCount]' --output text | grep '\t1' | awk '{print $1}')
echo "Minion instances IDs: $MINION_INSTANCE_IDS"

If one of the minions crashes, the code fragment above creates an illegal IP address for Gru: 54.233.252.21754.207.105.77

@dsperling dsperling added the bug label Mar 20, 2018
@ivanrodjr
Copy link
Author

Great, it is more impressive...it was just 1 instance, I had the compose file like this:

  • MINION_COUNT=1

and on aws all I saw was 2 instances, weird, will keep checking to see If something wrong on my end.

@dsperling
Copy link
Member

Oh. Take a look at the JMeter cluster in your ECS console and see how many registered instances you have. You can compare that to the output of the following command line which is used by Lucy to grab all container instances in the created cluster.

aws ecs list-container-instances --cluster JMeter

@dsperling
Copy link
Member

It looks there are two issues here:

  1. Your cluster has 125 registered instances when you expected 1 minion
  2. After running a second test, there were 2 instances without any running tasks. The Lucy script incorrectly created an IP address of 54.233.252.21754.207.105.77 for Gru.

I have a fix for number 2 above, but I am wondering how issue 1 occurred. Any additional information to help troubleshoot this?

dsperling added a commit that referenced this issue Mar 21, 2018
…ommand line

Fixed bug that was partially responsible for issue #20
Added JMETER_MEMORY documentation
@dsperling dsperling self-assigned this Mar 21, 2018
@ivanrodjr
Copy link
Author

Sorry, could not find any additional info, but thanks, seeing you fixed some.

@dsperling
Copy link
Member

Thanks for the feedback. I am closing this and merging the PR #21 today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants