Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding kompose up/down tests for openshift #460

Merged
merged 1 commit into from
May 2, 2017

Conversation

ashetty1
Copy link
Contributor

@ashetty1 ashetty1 commented Mar 1, 2017

Adding kompose up/down tests for openshift as part of #323.

The script runs kompose up/down using etherpad docker-compose file with oc cluster up.

Makefile Outdated
@@ -83,13 +89,13 @@ check-vendor:

# Run all tests
.PHONY: test
test: check-vendor validate test-unit-cover install test-cmd
test: check-vendor validate test-unit-cover install test-cmd test-openshift
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that test-openshift should be part of make test

export $(cat ${KOMPOSE_ROOT}/kompose/script/test/fixtures/etherpad/envs)

# Run kompose up
kompose --emptyvols --provider=openshift -f $KOMPOSE_ROOT/kompose/script/test/fixtures/etherpad/docker-compose.yml up; result=$?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to separate parts that are running tests from parts that are doing openshift setup.
Tests should be in different files from setup script

@kadel
Copy link
Member

kadel commented Mar 1, 2017

please add make test-openshift to travis.yml (after make test) so we can see it in action ;-)

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

Once you've signed, please reply here (e.g. "I signed it!") and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Mar 1, 2017
@ashetty1
Copy link
Contributor Author

ashetty1 commented Mar 2, 2017

I signed it!

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 2, 2017
@cdrage
Copy link
Member

cdrage commented Mar 2, 2017

A lot of the ; can be removed from the bash file

exit 1;
fi

oc login -u system:admin;
Copy link
Member

@surajssd surajssd Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to be system:admin for everything, it's only needed when creating pv.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will fix this.

@@ -0,0 +1,94 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing license


convert::kompose_down_check

if [ $(oc get pods | wc -l ) == 0 ] ; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be part of kompose_down_check? Does it make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add it as part of the same function

sleep 60;

retry_up=0
while [ "$(oc get pods | grep foo | grep -v deploy | grep -v build | awk '{ print $3 }')" != 'Running' ] ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly same block is repeating in other test cases. It might be better to have this as function in libs.sh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but then I am not sure if the same logic could be used for every case. So wasn't sure about adding it into libs.sh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should definitely go there, it can have some parameters to make it more generic

sleep 5;

# Check if all the pods are up
if [ "$(oc get pods | grep foo | grep -v deploy | grep -v build | awk '{ print $3 }')" == 'Running' ] &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly same block is repeating in other test cases. It might be better to have this as function in libs.sh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kadel Same comment as above: I thought about it, but then I am not sure if the same logic could be used for every case. So wasn't sure about adding it into libs.sh

Let me know if you think otherwise.


# Check if all the pods are up

if [ "$(oc get pods | grep redis | grep -v deploy | awk '{ print $3 }'| grep 'Running' | wc -l)" -eq 2 ] && [ "$(oc get pods | grep web | grep -v deploy | awk '{ print $3 }' | grep 'Running' | wc -l)" -eq 2 ] ; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also candidate for creating function from it and moving it to libs.sh


if [ $exit_status -ne 0 ]; then
convert::print_fail "kompose down failed"
exit 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove ;

exit 1;
fi

sleep 60;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove ;



# Wait
sleep 60;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove ;

exit 1;
fi

sleep 60;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove ;

done

# Wait
sleep 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove ;

fi

# Wait
sleep 60;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove ;


if [ $exit_status -ne 0 ]; then
convert::print_fail "kompose up has failed"
exit 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove ;


if [ $exit_status -ne 0 ]; then
convert::print_fail "Kompose down failed"
exit 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove ;

@kadel
Copy link
Member

kadel commented Mar 23, 2017

I just noticed that tests take 22mins now :-(
If I remember correctly there is 30min timeout in travis-ci, so we are getting dangerously close to that :-(

I guess that oc cluster up is taking long as is downloading docker images.

Is possible to cache those images on travis-ci?
What if we move this test to ci.centos.org. Is it going to be faster? Will it be possible to cache images there?

Otherwise it looks good. Thank you for creating kompose_up_check Tests look much cleaner now

@ashetty1
Copy link
Contributor Author

@kadel what I could do is not have run oc cluster up for every test case. We should have it run it only once.

@cdrage
Copy link
Member

cdrage commented Mar 28, 2017

There's over 7 60 second sleep calls as well as 30 second and 5 second sleep calls all over the place.

@ashetty1 here is some reference code: https://github.com/projectatomic/adb-tests/blob/master/cdrage-atomicapp-ci/functional-tests/providers/openshift.sh

as well as: https://github.com/projectatomic/adb-tests/blob/master/cdrage-atomicapp-ci/functional-tests/

Where we used curl + grep + grepping oc and kubectl to reduce this time. Only using sleep when absolutely necessary (oc new-project lags sometimes)

At the moment, this will take wayyyyy too long to run the tests each time. It already takes 10 minutes to run our ordinary tests, but with these tests it'll run even longer.

With the length of the tests, I'd suggest using the CentOS CI instead so we can run both in parallel. I originally suggested just Travis CI, but lately Travis has been running slow and it doesn't help that we're adding more and more unit / integration tests each day!

@ashetty1
Copy link
Contributor Author

@cdrage the sleep has been used only to provide some lag while the pods are being brought up. I tried reducing the sleep time to 10, and then modified the kompose_down_check to look for "Terminating' status; with both these changes, the tests took about 12mins. If this looks okay, I shall push the changes her: https://github.com/ashetty1/kompose/blob/entrypoint_cmd/script/test_in_openshift/lib.sh#L157

@cdrage
Copy link
Member

cdrage commented Mar 30, 2017

@ashetty1
It would be much better if you wrote a bash loop instead of using sleep as it'll really cut down on the deployment times. It's hard to rely just on sleep as we don't know exactly what box the CI test is being ran on at TravisCI and there will be a lot of false positives / having to reset the build.

@cdrage
Copy link
Member

cdrage commented Mar 30, 2017

@ashetty1
Copy link
Contributor Author

@cdrage Sure, I could get rid of all the sleeping, but then the only issue is about handling failures. How do you think we should handle a scenario where pods aren't getting deleted/terminated? That's the reason I have the retry count there. Sleep is useful there, I think.

@cdrage
Copy link
Member

cdrage commented Mar 30, 2017

In regards to actually clearing / deleting the pods to run the next test, we didn't run into any of those issues.

Besides, if a previous test didn't terminate correctly (kompose down) then the tests are working as intended!

Just have a look at how we ran all the tests here: https://github.com/projectatomic/adb-tests/tree/master/cdrage-atomicapp-ci/functional-tests and you'll see how we did it in terms of terminating / rebuilding the cluster to a clean-slate for each test being ran.

wget https://github.com/openshift/origin/releases/download/v1.4.1/openshift-origin-client-tools-v1.4.1-3f9807a-linux-64bit.tar.gz -O /tmp/oc.tar.gz 2> /dev/null > /dev/null
mkdir /tmp/ocdir && cd /tmp/ocdir && tar -xvvf /tmp/oc.tar.gz > /dev/null
sudo mv /tmp/ocdir/*/oc /usr/bin/
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that all of this could be a serious issue in terms of what's being ran on the host system..

Shouldn't we error out instead if there's no oc on the host system? All of these sudo commands as well as replacing parts of the DOCKER_OPTS (I know that I have custom stuff in there that I don't want overridden / changes)

Obviously for Travis / CentOS CI we'll have this wget magic here, but not for testing.

@kadel do you agree?

Copy link
Contributor Author

@ashetty1 ashetty1 Apr 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdrage This is specific to travis. What I could do is set an env variable "travis=true" in travis.yml, and have that checked before running this function. Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make sense, but people may want to run this test on their host system. I know I'd like to, instead of having to rely on Travis each time. @kadel ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdrage we exit with a message saying they need to have the oc binaries installed?

Copy link
Member

@cdrage cdrage Apr 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashetty1 yes, we remove these sudo commands and mv commands and replace them with a prompt in regards to installing oc. If it's ran within travis (the TRAVIS=true param thing), then we should go ahead and install.

Copy link
Contributor Author

@ashetty1 ashetty1 Apr 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, install_oc_client should be run only if its run within travis. If you make sure that its run only there than its ok.

sleep 60

convert::kompose_down_check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these files are mostly copy-and-paste of all the functions. Is there a way to simplify these into a function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some sense, yes. But writing a function would mean writing a wrapper for kompose operations. I thought it would be too broad/diverse to handle within a single function.

sleep 5
convert::oc_cleanup
done
convert::oc_cluster_down
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if [ $retry_up -lt 10 ]; then
echo "Waiting for the pods to come up ..."
retry_up=$(($retry_up + 1))
sleep 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reduce this to 1 and instead try 120 times (so we wait 2 minutes?)

echo "Waiting for the pods to go down ..."
oc get pods
retry_down=$(($retry_down + 1))
sleep 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reduce this to 1 and instead try 120 times (so we wait 2 minutes?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@cdrage
Copy link
Member

cdrage commented Apr 4, 2017

github.com/kubernetes-incubator/kompose  pr_460 ✔                                                                                                                                                                                                                         3h34m  
▶ make test-openshift 
./script/test_in_openshift/run.sh


===> Starting test <===
Functional tests on OpenShift
./script/test_in_openshift/run.sh: line 29: oc: command not found
FAIL: Please install the oc binary to run testsMakefile:62: recipe for target 'test-openshift' failed
make: *** [test-openshift] Error 1

I think there should be a newline after saying the FAIL error.

@cdrage
Copy link
Member

cdrage commented Apr 4, 2017

@ashetty1

Doesn't work it seems if a server isn't already up:

github.com/kubernetes-incubator/kompose  pr_460 ✔                                                                                                                                                                                                                        3h36m  ⍉
▶ make test-openshift 
./script/test_in_openshift/run.sh


===> Starting test <===
Functional tests on OpenShift
error: server took too long to respond with version information.
FAIL: Please install the oc binary to run testsMakefile:62: recipe for target 'test-openshift' failed
make: *** [test-openshift] Error 1

I believe you should be checking if the binary exists in /usr/local/bin or /usr/bin/ instead of checking via oc version

@ashetty1
Copy link
Contributor Author

ashetty1 commented Apr 4, 2017

@cdrage
Copy link
Member

cdrage commented Apr 4, 2017

@ashetty1 Yes. Checking if OC is there instead of oc version.

@cdrage
Copy link
Member

cdrage commented Apr 5, 2017

@ashetty1

Seems like I'm getting another error:

▶ make test-openshift 
./script/test_in_openshift/run.sh


===> Starting test <===
Functional tests on OpenShift
FAIL: oc cluster up failed.
Makefile:62: recipe for target 'test-openshift' failed
make: *** [test-openshift] Error 1

There's no output for oc cluster up when using make test-openshift. May be a good idea in order to diagnose issues.

Of course, there's an error in oc (I know what the issue is, was just testing it out):

github.com/kubernetes-incubator/kompose  pr_460 ✔                                                                                                                                                                                                                                                                                                                   3h59m  ⍉
▶ oc cluster up
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... FAIL
   Error: Minor number must not contain leading zeroes "03"

Which is unrelated, but of course, it'd be nice to have the output from the Makefile so people can diagnose why they can't run make test-openshift

@cdrage
Copy link
Member

cdrage commented Apr 5, 2017

Another note is that it would be nice to see the output of oc cluster up regardless if it errors or not.

@ashetty1
Copy link
Contributor Author

ashetty1 commented Apr 7, 2017

@kadel , @cdrage let me know if this looks good now.

@surajssd
Copy link
Member

It works for me except for the kompose down issue in #382
otherwise it's good

@procrypt
Copy link
Contributor

@ashetty1 I was looking into the issue why pods are not getting deleted, I found out that the label io.kompose.service is not getting populated into the podSpec when we use buildConfig and since we are deleting the objects based on labels, kompose down fails to delete the pods.

podSpec of pods created with buildConfig

$ oc describe pods foo-1-build 
Name:                   foo-1-build
Namespace:              ns
Security Policy:        privileged
Node:                   192.168.121.71/192.168.121.71
Start Time:             Thu, 20 Apr 2017 11:30:02 +0000
Labels:                 openshift.io/build.name=foo-1
Status:                 Failed
IP:                     172.17.0.2
Controllers:            <none>

As you can see there is different label openshift.io/build.name=foo-1 not io.kompose.service, this is the reason why kompose down fails to delete it.

The issue is with OpenShift so I don't how long it will take to resolve this.

@ashetty1
Copy link
Contributor Author

@cdrage , @surajssd if the issue is with OpenShift, should we unblock and merge this PR?

@surajssd
Copy link
Member

@ashetty1 I would recommend commenting that test case, create an issue that tracks that commented test case and then I guess we can proceed.

Once we know how to delete that undeleteed pod we uncomment that test.

@cdrage
Copy link
Member

cdrage commented Apr 21, 2017

@ashetty1 Got to make sure that even though pods don't get deleted, that after oc cluster down.

@ashetty1 ashetty1 force-pushed the openshift_up_down_test branch from 71c5143 to e5d3eaf Compare April 24, 2017 09:35
@ashetty1
Copy link
Contributor Author

@surajssd have taken buildconfig tests off this commit.

@surajssd
Copy link
Member

@ashetty1 please correct me if I am wrong here, but as I see it there are two issues with this PR.

First issue: there is a problem with buildconfig that even if buildconfig has label the build pod does not have the label. So it is not deleted. And it is tracked at #382

So I would recommend that we keep this test, and just comment it out. In issue #572 we say that we need to uncomment this test once issue #382 is fixed.

Second issue: another problem with buildconfig is that if the git repo is on some random commit the buildconfig generated is weird and it fails. And it is tracked at #561

Here what we can do is not rely on the test case example in the repo but just git clone some git repo in temporary directory and then try to run buildconfig test there so that we will get consistent results. So this does not need any special issue to track anything.

@ashetty1 ashetty1 force-pushed the openshift_up_down_test branch from e5d3eaf to 5f8a44a Compare April 24, 2017 14:25
@ashetty1
Copy link
Contributor Author

@surajssd I have just commented out the kompose down part now for buildconfig tests.

convert::kompose_up_check -p foo

# Kompose down for buildconfig fails being tracked at #382
# convert::kompose_down $docker_compose_file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the one you have commented right? @ashetty1

Copy link
Contributor Author

@ashetty1 ashetty1 Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@surajssd Yes, that's right.

@surajssd
Copy link
Member

@cdrage this looks good to me see if you are good with it go ahead and merge it

@cdrage
Copy link
Member

cdrage commented Apr 24, 2017

Good enough! Although a new issue will have to be opened to re-add buildconfig testing once the other PR has been merged.

@surajssd
Copy link
Member

@cdrage so there is one issue for this, that needs some work, i was gonna point to the line number of the merged code sayig uncomment this or something like that once this is merged in here #572

@ashetty1
Copy link
Contributor Author

@kadel, could you please take a look?

wget https://github.com/openshift/origin/releases/download/v1.4.1/openshift-origin-client-tools-v1.4.1-3f9807a-linux-64bit.tar.gz -O /tmp/oc.tar.gz 2> /dev/null > /dev/null
mkdir /tmp/ocdir && cd /tmp/ocdir && tar -xvvf /tmp/oc.tar.gz > /dev/null
sudo mv /tmp/ocdir/*/oc /usr/bin/
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, install_oc_client should be run only if its run within travis. If you make sure that its run only there than its ok.

replica_2=$replica
fi

pod_1=$( echo $pod | awk '{ print $1 }')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means that you can't check more than two pods?

Copy link
Contributor Author

@ashetty1 ashetty1 Apr 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kadel For now, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that :-(
If you are not going to fix that, please write some big warning that this is the limitation and also TODO that this needs to be fixed. It will cause a lot of confusion for anyone writing tests.

sudo mv /bin/findmnt /bin/findmnt.backup
sudo /etc/init.d/docker restart
# FIXME
wget https://github.com/openshift/origin/releases/download/v1.4.1/openshift-origin-client-tools-v1.4.1-3f9807a-linux-64bit.tar.gz -O /tmp/oc.tar.gz 2> /dev/null > /dev/null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1.5 was released, lets use that

fi
}

function convert::kompose_down_check () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add comment explaining what parameter is required for this function and what it means?

@ashetty1 ashetty1 force-pushed the openshift_up_down_test branch from 5f8a44a to dba26c3 Compare April 26, 2017 11:11
}


function convert::kompose_up_check () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can cause a lot of confusions, can you please add comment explaining what are the limitations of this function and what it does?

@kadel
Copy link
Member

kadel commented Apr 28, 2017

One last stupid question 😇
Why is every function prefixed with convert::?

* This PR adds functional tests for kompose up/down. The test scripts
are hosted under script/test_in_openshift. The directory structure,
as follows:

	script/test_in_openshift/
	├── compose-files
	│   └── docker-compose-command.yml
	├── lib.sh
	├── run.sh
	└── tests
		├── buildconfig.sh
    		├── entrypoint-command.sh
    		├── etherpad.sh
    		└── redis-replica-2.sh

* script/test_in_openshift/run.sh is the master script
  which executes all the tests

* script/test_in_openshift/lib.sh consists of helper functions
  for `kompose up` and `kompose down` checks

* script/test_in_openshift/tests directory consists of test scripts

* The scripts use 'oc cluster up' for setting up a single-machine
  OpenShift cluster. It exits if oc binaries are not installed

* Most of the docker compose files used are the ones already
  available in examples/ or script/test/fixtures.

* How to run the tests: 'make test-openshift'
@ashetty1 ashetty1 force-pushed the openshift_up_down_test branch from dba26c3 to 1b3d876 Compare May 2, 2017 06:32
@ashetty1
Copy link
Contributor Author

ashetty1 commented May 2, 2017

@kadel just used the convention that existed. No other reason :)

Have made the changes you requested.

@kadel
Copy link
Member

kadel commented May 2, 2017

@kadel just used the convention that existed. No other reason :)

OK, it looks a bit strange there, but apparently no one else had has any issues with that, so lets keep it ;-)

Thank you for changes 👍

LGTM

@kadel kadel removed the blocked label May 2, 2017
@kadel kadel merged commit abf8192 into kubernetes:master May 2, 2017
@ashetty1 ashetty1 deleted the openshift_up_down_test branch May 3, 2017 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants