You've reached qa-reports deployment repository. It contains all necessary scripts to get an instance of SQUAD running in production.
We moved from ansible managed EC2 instances to containerized deployment with Kubernetes in AWS EKS.
There are 4 different environments available: dev, testing, staging and production. Both production and staging are hosted in Fargate nodes shared in a single EKS cluster, while "testing" is also in run in the same cluster, it's used in specific scenarios for testing. The dev environment uses 3 local virtual machines, 2 for Kubernetes master/worker nodes and 1 for PostgreSQL/RabbitMQ services.
Deploy a dev|testing|staging|production
environment by running:
$ ./qareports production up
This should make sure all resources are created and running.
Run this command when there's a new release of SQUAD available in https://hub.docker.com/r/squadproject/squad/tags:
$ ./qareports production upgrade_squad [squad-release]
If squad-release, e.g. 1.16, is given, then all images will be replaced by that specific one. If no tag is given, qareports will be upgraded to the latest tag in dockerhub.
NOTE: this will update SQUAD code only, it doesn't update environment variables or other setup in this repo.
Run this command when there's a change in any of the deployment files:
$ ./qareports production deploy
NOTE: this will re-apply all configuration files, including environment variables and will update SQUAD at the very last step.
Here is a list of handy commands to manage qareports:
./qareports production queues -w
lists all queues in production and keep watching./qareports production pods
lists all pods running in production./qareports production top
lists all pods' stats of CPU and memory./qareports production logs -f pod-name-with-hash
displays logs for a given pod./qareports production logs -f -l app=qareports-worker
display logs from all "qareports-worker" pods./qareports production k describe pod pod-name-with-hash
displays more details from a given pod./qareports production ssh pod-name-with-hash
ssh into any pod in production. NOTE: for qareports-web pods, you should append-c qareports-web
to the command, because there are two containers in this pod.
Here are some utility commands that might help debugging and accessing things:
./qareports dev up
creates k8s cluster, RabbitMQ and PostgreSQL instance and deploy squad./qareports dev upgrade_squad
upgrades SQUAD docker image in all deployments./qareports dev destroy
destroys development deploy./qareports dev list
lists all resources in the cluster, useful to discover pods./qareports dev ssh master-node
ssh into the master node./qareports dev ssh qareports-listener-deployment-947f8d9b8-ntfww
ssh into pod runningsquad-listener
.- NOTE: be careful when running heavy commands on this pod, it's limited to a maximum of 512MB of RAM, but dont't worry it it crashes, Kubernetes scheduler will just removed crashed one and spawn a new one in no time!
./qareports dev logs -f deployment/qareports-web-deployment
gets the log stream of all pods under qareports-web deployment./qareports dev k <kubectl-args>
runkubectl
on development environment./qareports dev k delete pod qareports-listener-deployment-947f8d9b8-ntfww
deletes a bad pod. If a pod crashes and Kubernetes didn't removed it (but it should've), it's useful to delete that pod so that forces creating a fresh new one.
There are some tools necessary to manage qareports, make sure they all are installed to your $PATH:
-
terraform: tool needed for managing resources on cloud like AWS, GKE
https://releases.hashicorp.com/terraform/0.11.14/terraform_0.11.14_linux_amd64.zip
-
ansible: tool for automating node setup Install according to your distro: https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html
-
kubectl: tool for managing kubernetes cluster
https://storage.googleapis.com/kubernetes-release/release/v1.18.0/bin/linux/amd64/kubectl
Here are some notes taken while creating this repository.
There are a lot of reasons why a pod won't start:
- the image doesn't exist
- there's nowhere or not enough computer power available to deploy it
- staging and production pods are supposed to be scheduled under AWS Fargate, and that happens only if you're deploying the pods under the correct namespace
Usually a command to describe what is going on on a pod is
./qareports dev describe pod pod-name
Starting celery with '--without-mingle' prevented it from crashing everytime a new worker was started in parallel. More info: https://stackoverflow.com/questions/55249197/what-are-the-consequences-of-disabling-gossip-mingle-and-heartbeat-for-celery-w
Sometimes celery suffers from celery_chord_unlock
tasks that never reach a timeout and causes the fetch-workers to
do useless work. I still have not found the reason for this yet, but until then, it's convenient to purge the ci_fetch queue.
- First kill all fetch-workers
- ssh -i tmp/qareports_private_ssh_key
cat terraform/generated/production_rabbitmq_host_public
(you'll need to run./qareports production queues
first) - sudo rabbitmqctl purge_queue ci_fetch
Sometimes pods act weird and enter a terminating state where it hangs forever. You can force-terminate this pod by running
./qareports production k delete pod --grace-period=0 --force <pod-name>
We're currently using AWS Simple Email Service aka SES to send emails. On an account that SES was never used, AWS puts it under sandbox mode, for security. This way SES will only send messages to verified emails. For production use, you NEED to create a support ticket in AWS asking to move SES out of sandbox mode.
Once things are cleared in SES, there are 2 ways to send emails: as an SMTP relay or as RESTfull API. We're using the second one for convenience. It's super-super easy to make it work. You just spin up a docker container from https://github.com/blueimp/aws-smtp-relay and it proxies all email requests to SES.
Initially aws-smtp-relay was placed in qareports-worker pod which sits on a Fargate serverless node. Sending emails wasn't possible because the node is required to have an SES IAM policy to authenticate to SES. Until date (Jun/2020) I couldn't find a way of doing this. A workaround was to place aws-smtp-relay in a regular EC2 node (EKS master node) where the necessary SES policy was attached to make emails possible.
The two settings to handle EMAIL are SQUAD_EMAIL_HOST
and SQUAD_EMAIL_PORT
, which should point to the aws-smtp-relay service.
Posts that helped a LOT understanding AWS Networking: