AWS Dev

This page is intended for developers of AmbariKave. For users, return to Home.

This page is an introduction/tutorial to working with AWS as a developer in our team.

Developing using AWS

We have two basic choices when it comes to developing new features of AmbariKave, do we want to do it locally with a vagrant setup, or remotely on amazon. In principle vagrant can be faster, but amazon lets us run many more machines in parallel and is integrated into our existing testing framework so it is more automated.

Prerequisites

Onboarding checklist

Have you completed the Onboarding checklist?

OK, then you must have:

A linux machine running somewhere that can ssh out over port 22
(your admin will probably have followed the NewDevMachine instructions)

And on this machine you must have:

Your favorite IDE (eclipse?)
Git keys for committing to the git repo, uploaded to gitlabs and stored on this machine
A checkout of the head of the Ambari KAVE on that machine to start your development
A note of your aws secret ID/key
A keypair (ssh-keys) uploaded to amazon and stored on this machine

Our AWS account can be reached through: https://kpmg-ta.signin.aws.amazon.com/console

First installation

If you never did this before, you need to install the awscli client:

sudo su
bash -l # if bash is not your default shell
source /opt/KaveToolbox/pro/scripts/KaveEnv.sh #if KaveToolbox is installed
conda update conda
pip install awscli #If pip is not installed in the above step, use option # 2 to install pip [here](https://www.liquidweb.com/kb/how-to-install-pip-on-centos-7/)
pip install autopep8
yum -y install pdsh # if this doesn't work, use [this](https://www.youtube.com/watch?v=Uh2Jcl6QiUc) to install pdsh from [here](https://code.google.com/archive/p/pdsh/downloads)
yum -y install pdsh-mod-dshgroup
yum -y install bind-utils
exit # exit login shell
exit # exit sudo su

Then exit from the sudo shell and configure awscli (entering your secret codes):

aws configure

Our default region for europe is eu-central-1, for asia using a amazon windows workspace it is: ap-northeast-1, for india using a local dev machine or linux workspace it is ap-south-1

Check (a): run the first unit test suite

Well, let's see! The tests will tell you if it works or not!

1 Go to the tests directory of the project and run the unit tests (<1 second). If these pass then you at least have aws cli and all python libraries installed correctly.

./test.sh unit/all.py

If this does not work, then contact someone in the dev team to help you out. Most common problems here are with pep8.

Creating your security config file

If you didn't do this already, you'll need a security config file. A security config file is a json file saved on your system somewhere and the contents of that file look like BUT ARE NOT IDENTICAL TO:

{

 "SecurityGroup" : "sg-900aabbe",
 "Subnet" : "subnet-900aabbe",
 "AccessKeys" : { "AWS" : { "KeyFile" : "/some/aws/key/file", "KeyName" : "yourkeyname" },
 				  "SSH" : { "KeyFile" : "/some/aws/key/file" },
	  			  "GIT" : { "Origin" : "https://github.com/KaveIO/AmbariKave.git", "KeyFile" : "/some/git/key/file"}
	  			  },,
 "Tags" : {"Project" : "AmbariKave", "User" : "someusernameblah"} 
}

You will need to modify this file to meet your needs: get a recent file from another developer to start from

The security config file saves all the information needed by our scripts to work out where to put your test machines. You only need to create this file once, and the scripts will take care of the rest. SecurityGroup and SubnetMasks probably already exist for what you want to do. Ask an existing developer for help if you're not sure what to put here. If you want to know more about this file check this.

The "KeyFile" variables listed here are the local path to where you have stored your private keys to access git and amazon. These are standard .pem SHA/RSA/open-ssh keys of the type that you would have downloaded from amazon, or git, or that you created yourself.

NOTE ON KEYS: For the purposes of these tests, your key files cannot be password-protected. This is because the keys need to be copied around the test machines to download software and automate ssh command scripts. It is therefore a good idea to use these keyfiles only in the tests, and use something different for real deployments.

This file can be automatically picked up through the environment variable AWSSECCONF, and you should add this to your .bashrc file:

if [ -z "$AWSSECCONF" ]; then
export AWSSECCONF="path/to/my_security_config.json"
fi

Check (b): run selected tests from the deploy test suite

First run:

./test.sh deployment/one_centos_dev.py --verbose # old script was centos7.py

This test creates one machine in aws, in the security group that you specified in your AWSSECCONF, with the keys specified in your AWSSECCONF. This should take around three minutes.

If this fails instantly it will be because your AWSSECCONF is completely incorrect
If this takes a very long time and then fails, something went wrong in contacting the machine, this is most likely one of two problems:
- (1) your IP address has not been correctly added to default whitelists: try copy-pasting the ssh command, if this times out, this is the problem, contact a dev to help you fix
  - ssh command: use the ssh command given below from you linux machine to login to the new machine created using the script mentioned in the above command.
  - ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i pem_file_path root@ip_address_of_machine_created
    - pem_file_path : It is the path of public key which is saved in you linux machine.
    - ip_address_of_machine_created : Give the ip address of the new machine that you want to login to from you current linux machine
- (2) what you have put in your AWSSECCONF with respect to SSH keys is incorrect: try copy-pasting the ssh command, if you are prompted for a password, this is the problem, contact a dev to help, maybe you've used password-protected keyfiles, or mixed up which public/private key is which

Second if this works correctly, now run:

./test.sh deployment/micro_cluster.py --verbose

This tests tries to make a teeny tiny cluster of machines in a new VPC on aws and see if you can contact it.

If this fails instantly it will be because your AWSSECCONF is completely incorrect, or because you don't have the correct permissions in aws
If this takes a very long time and then fails, something went wrong in contacting the machines in this cluster, and since you know from above that most of your AWSSECCONF is fine, then the default whitelist settings in our cloudformation script must be wrong and need fixing for your IP range. Contact a lead developer to fix this.
if this fails in Cloud formation: "A client error (SignatureDoesNotMatch) occurred when calling the CreateStack operation: Signature not yet current" this is caused by your local system clock not being in sync with the real actual time. Fix this by installing ntp On Centos

Third if this works correctly, now run:

./test.sh deployment/single_machine_cluster.py --verbose

This creates one machine on aws, and installs the git head of ambari onto this machine. If this does not work it is almost certainly because your git keys in your AWSSECCONF are incorrect, or you have uploaded them to the wrong git server.

Finally kill the machines you created:

./kill_recent_tests.py

Creating your dev image

Once this all works, time to put everything together and create a dev image. A dev image is an aws image file which is used to speed up certain tests.

A dev image is a system image of a Centos machine which already has the head of our ambari released pre-installed and ready-to-go. It also has a running ambari with some "default" single machine cluster.

Creating this image takes around 5 minutes, and it saves 5 minutes from every test cycle you will do later, so it is definitely worth it. Unfortunately, though, you'll need to make a new image whenever there is a major change in the development or whenever you change your amazon keys.

./AmbariKave/deployment/aws/new_dev_image.py

This will output an image name, which you can then add to your bashrc and forget about it until you need a new image:

if [ -z "$AMIAMBDEV" ]; then
    export AMIAMBDEV="ami-#########"
fi

If this does not work, it means something is still wrong in your AWSSECCONF, or elsewhere. Go back to the testing phase or ask a dev to help. It takes a while (20 minutes) for an ami to be completely registered in aws. Once this is done, you can continue.

Check (c): run selected tests from the deploy test suite

First run:

./test.sh deployment/one_centos_dev.py --verbose

This will create one new machine with your centos dev image and check that you can contact it.

If the ami does not exist: either wait a little longer for it to be created or contact another developer

Finally kill the machines you created:

./kill_recent_tests.py

And also go to the aws console web interface and kill any machines in your name called new_dev_image, this are not killed automatically by the script.

Check (d): Does everything work?

Well, let's see! The tests will tell you if it works or not!

1 Go to the tests directory of the project and run the unit tests (<1 second). If these pass then you at least have aws cli and all python libraries installed correctly.

./test.sh unit/all.py

2 Run the deployment tests (~2 minutes). If these pass it will tell you that you have a working dev image and security config file. While this is running you can observe the ec2 dashboard and watch new instances being created.

./test.sh deployment/all.py

3 Try testing a service that should really work. This will ensure the master branch, and/or your own branch, is working well. In the ec2 dashboard you could now find the ip-address of the

./test.sh service/remote_service_with_servicesh.py APACHE --verbose
#./test.sh service/remote_service_with_blueprint.py KAVELANDING # currently buggy

Don't forget to kill your tests at the end:

./kill_recent_tests.py

Frequent problems?

Re-occurring timeout when contacting newly created machines: (a) this happens if you didn't choose the correct private key for ssh connections which matches the keypair specified in your security config file, take another look at this file, and ask for help if needed. ./connect_to.py would also show a very similar error. (b) this also happens if the IP address of the computer you are on is not correctly added to the exceptions of the security group you are trying to contact. This should have been sorted out by the person who upped the right machine for you. Take another look at the inbound rules for the security group, or get in touch with an expert. (c) This also happens if the subnet to which you're upping machines is not configured to auto-assign public IP addresses. Take another look at the configuration of this subnet, or get in touch with an expert.
Asking for a password for a newly created machine: Exactly the same problem as above (a), not sending the correct keyfile/keypair often causes ssh to fall back to asking for a password, but most often there is no such password, and therefore you have no idea what to do.
Client Error when using Cloud formation: "A client error (SignatureDoesNotMatch) occurred when calling the CreateStack operation: Signature not yet current" this is caused by your local system clock not being in sync with the real actual time. Fix this by installing ntp On Centos

Reminder: NEVER 'stop' your own dev box!

Accidentally stopping your own dev box through amazon will release the IP associated to it, and mean we need to add yet another ip-exception in our firewall. This is to be avoided since it takes some time to be done.

Reminder: PLEASE 'stop' and/or 'kill' test machines you created!

Once a week we will review and stop/terminate running machines on aws, but it is cheaper/safer by far if you stop/terminate machines as soon as you no longer need them, especially if they are very large.

There are some helper scripts for this, e.g.:

./tests/kill_recent_tests.py

If the test machines are older than 6 hours, you can add an arguement to specify machines as old as 23 hours to be killed.

./tests/kill_recent_tests.py 23

If the word 'test' does not appear in the name of the machines, you will need to kill them yourself, which is a three step-process.

Terminate the machines through the ec2 web interface
Navigate to the CloudFormation interface and delete any cloudformation stacks associated to these tests
run the ./tests/kill_recent_tests.py to clean up the remaining volumes.

Development steps

1: Doing an exploratory installation

Any new service starts off with an exploratory installation. Here you will want a new blank centos machine to get to and test the installation of a new service without ambari or any other service installed.

Create your test machine:

./AmbariKave/deployment/aws/deploy_one_centos_instance.py Explore-Service-XXXX --verbose --not-strict

(replace XXXX with your own service name) This script will finish by telling you how to connect to the new centos image over ssh.

SSH to this machine
Perform the installation as you like, noting the steps that you take, the tutorials you followed, and however you verified it was working at the end

Output: a short list of installation instructions or a short script appended to the correct JIRA task.

2: Making a branch for your development

We perform feature development using independent branches. Creating a branch in git is very easy. First of all, start your branch from the latest pulled master from the main repository.

Branch off of the master into a branch called after the feature you are designing for, such as "FREEIPA". If the feature name matches the name of a service you want to add, this will make things much clearer and make tests much simpler. In eclipse IDE this is very simple to do, right-click on the project, Team->Switch To->New Branch.

Since this is your own branch, you can commit to it as often as you like. In order to keep up-to-date with any changes on the master branch, we encourage regular merging of the master changes on to your branch, and rebasing as necessary, even though this can mess up the commit history a little bit it's better than falling very far behind.

In principle, as a developer of a new feature, someone else will need to review and test your code before it can be merged back onto the master, ready for production.

If this is a new service you are adding, you will need to create a new service directory, and the eaiest way to do that is by copying an existing similar service and reviewing/changing the names.

Output: a branch named smartly after a feature where you can do your own development.

3: Development and testing cycle

Creating a new service usually means implementing the following three pieces:

Wrap the installation in a script
Template all configuration files
Review add/remove configurable parameters to ensure the most logical set is available

Independent of what you are doing here, you can use the same basic testing and development framework for this:

Editing in your IDE and committing on your branch
Unit testing locally where possible
Testing the deployment over-and-over again within a dedicated centos instance or cluster
Testing the deployment on a fresh centos instance or cluster
Testing the deployment using the standard test scripts

The simplest way of testing a new service is actually using option (5), where you can call something as simple as:

#up a new machine and try installing your service there with service.sh (nothing other than service.sh needed):
test.sh service/remote_service_with_servicesh.py MYSERVICE --this-branch
#up a new machine and try installing your service there with a blueprint you already wrote
test.sh service/remote_service_with_blueprint.py MYSERVICE --this-branch
#up an entire cluster and deploy a huge blueprint across it (providing you already wrote the blueprint):
test.sh integration/clusters.py MYSERVICE --this-branch

Output: A branch containing code which in principle installs your service, or some other small well-tested improvement

3.1: Editing in your IDE

Since you are working on an independent branch you can commit and push to and from the gitlab source to implement your new feature iteratively. This is very helpful to maintain your code across multiple different machines which is needed for some services or features.

3.2: Unit testing locally if possible

This is very simple:

#run some simple unit tests which in principle check that code is OK
tests/test.sh tests/unit/all.py

You can add your own unit tests in case you're changing stuff in the framework such as modifying what's in the kavecommon.py, but actually testing a full service needs a remote machine to test the installation, and this you just can't do in a very fast way.

3.3: Testing over and over again with a dedicated centos instance or cluster

You can up one centos machine in around 120 seconds with:

./AmbariKave/deployment/aws/deploy_one_centos_instance.py Develop-Service-XXXX --ambari-dev  --verbose --not-strict

This machine will already have the dev image you made earlier, and therefore have a running instance of ambari already. SSH to that machine to start your development cycle.

If your service needs more than one machine to make sense, this is more complicated, and you'll need to create a whole cluster instead, by defining a cluster file and then calling:

./AmbariKave/deployment/aws/up_aws_cluster.py  --verbose

SSH to the development machine ambari node
Make sure you pull the latest version of your own code and restart the ambari server with:

./ambarikave/dev/pull-update.sh [branch name of your branch]

There are then two ways you can attempt the first installation

Through the web interface. Find out the ip address of this ambari node and navigate to http://ipoftheambarinode:8080/
Via service.sh, if implemented service.sh will let you do a very simple installation, but this only works smoothly if there are no parameters required by default for your installation

./ambarikave/bin/service.sh install MYSERVICE -h ambari.kave.org

Once this has been called, you can monitor the installation progress through the web interface http://ipoftheambarinode:8080/
If the installation fails, a lot of the time it is quite possible to restart it through the web interface itself, and this can speed up the development cycle for you.
The cycle is then

Edit the code that you wish in your ide or locally on that machine
./ambarikave/dev/pull-update.sh [branch name of your branch]
#refresh webpage, log in and restart the install

If your service is so complicated that it needs a blueprint to install it, this is possible also, you can create the same sort of instance with deploy_one ... --ambari-dev, and then you have to clean the ambari installation:

#on your development machine with aws access:
./AmbariKave/deployment/aws/deploy_one_centos_instance.py Develop-Service-XXXX --ambari-dev  --verbose --not-strict
#on your remote ambari machine, clean the ambari installation
./ambarikave/dev/clean.sh
./ambarikave/dev/pull-update.sh [branch name of your branch]

3.4: Testing the deployment on a fresh centos instance or cluster

Once your installation script seems to work on one machine, you must try it on a fresh machine, until you can install with no problems on a fresh machine.

3.5: Testing the deployment using the standard test scripts

Every night, Jenkins tries to run a lot of tests from the tests directory, this includes all sort of test levels, unit, system, and integration-level tests. The testing framework has been written in a generic way such that you can use it aswell, if you want, and this helps automate a whole lot of more steps to make the development cycle faster.

#up a new machine and try installing your service there with service.sh (nothing other than service.sh needed):
test.sh service/remote_service_with_servicesh.py MYSERVICE --this-branch
#up a new machine and try installing your service there with a blueprint you already wrote
test.sh service/remote_service_with_blueprint.py MYSERVICE --this-branch
#up an entire cluster and deploy a huge blueprint across it (providing you already wrote the blueprint):
test.sh integration/clusters.py MYSERVICE --this-branch

When a test needs a blueprint or cluster file, it expects to find that file in the correct tests directory, such as tests/service/blueprints and tests/integration/blueprints
The tests monitor their own installation, failing and giving you feedback such that you don't need to actively monitor them yourself

Using a mixture of this approach along side the "Testing over and over again with a dedicated centos instance or cluster" can really speed things up.

4: Tidying up, documenting, and marking ready for merge

Once you are satisfied that the code works for you on a fresh machine you will need to:

Tidy up your code, remove useless commented-out sections, consider consolidating code into functions, add comments which document the code itself
Merge the master onto your branch, this will take the latest changes from the master and ensure the person going to merge for you has the simplest time of it. If your code causes merge conflict with the master, the person trying to merge for you is within their rights to simply not do it until you fix the conflicts
Check the code after merging with standard tests if possible, at least run the unit tests which will check, for example, that you have added lines to service.sh
Push your changes back to the repo on your branch
Notify the team in the stand-up or through JIRA that your feature is ready for merging.
Kill all the test machines you may have created.

./tests/kill_recent_tests.py

Output: Your feature, ready for merging to the master branch

Kave on Azure

Kave on Azure Home

For contributors

Developer Home

For someone who modifies the AmbariKave code itself and contributes to this project. Persons working on top of existing KAVEs or developing solutions on top of KAVE don't need to read any of this second part.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly