Skip to content

Commit

Permalink
EKS release 1.25 w dual managed node groups and pod affinity (#46)
Browse files Browse the repository at this point in the history
* actively manage the EBS drive volume to give us more control over its lifecycle

* add node affinity

* lint

* lint

* rename pod affinity label key

* add a hosting node group

* refactor

* lint

* lint

* add a managed EBS volume

* add post-deployment script

* bump kubernetes releast to 1.25

* deprecate cert_manager_image_version

* lint

* lint

* bump versions: vpc-cni=v1.16.0-eksbuild.1, aws-ebs-csi-driver=v1.16.0-eksbuild.1

* testing

* add bastion IAM arn to aws-auth.mapUsers

* add bastion IAM user to aws-auth.mapUsers via Terragrunt so that it can be customized

* lint

* add change notes
  • Loading branch information
lpm0073 authored Feb 25, 2023
1 parent 47a2737 commit 0abb55e
Show file tree
Hide file tree
Showing 43 changed files with 689 additions and 1,634 deletions.
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,15 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).

## [1.0.25] (2023-2-25)

- bump AWS EKS to release 1.25
- bump AWS EKS Add-on versions
- parameterize aws-auth.mapUsers
- refactor AWS EKS Managed Node groups into two groups, service and hosting. Default the service group to 3 nodes and the hosting group to 0.
- remove AWS EKS service node taints. replace these with node affinity for service pods to encourage isolation of these from nodes running mostly application software.
- create AWS EBS volumes for Wordpress deployments so that we can control lifecycle, naming and drive volume attributes.

## [1.0.24] (2023-2-17)

- refactor MySQL and MongoDB remote backup solutions
Expand Down
21 changes: 10 additions & 11 deletions cookiecutter.json
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,9 @@
"stack_add_remote_mysql": ["Y", "N"],
"stack_add_remote_mongodb": ["Y", "N"],
"stack_add_remote_redis": ["Y", "N"],
"cert_manager_image_version": "v1.11.0",
"ci_build_tutor_version": "15.2.0",
"ci_build_kubectl_version": "1.24/stable",
"kubernetes_cluster_version": "1.24",
"ci_build_kubectl_version": "1.25/stable",
"kubernetes_cluster_version": "1.25",
"ci_build_theme_repository": "edx-theme-example",
"ci_build_theme_repository_organization": "lpm0073",
"ci_build_theme_ref": "main",
Expand Down Expand Up @@ -84,15 +83,15 @@
"ci_openedx_actions_tutor_plugin_enable_notes_version": "v1.0.2",
"ci_openedx_actions_tutor_plugin_enable_s3_version": "v1.0.2",
"ci_openedx_actions_tutor_plugin_enable_xqueue_version": "v1.0.0",
"eks_worker_group_instance_type": "t3.xlarge",
"eks_hosting_group_instance_type": "t3.large",
"eks_create_kms_key": ["Y", "N"],
"eks_worker_group_min_size": 0,
"eks_worker_group_max_size": 1,
"eks_worker_group_desired_size": 0,
"eks_karpenter_group_instance_type": "t3.large",
"eks_karpenter_group_min_size": 3,
"eks_karpenter_group_max_size": 10,
"eks_karpenter_group_desired_size": 3,
"eks_hosting_group_min_size": 0,
"eks_hosting_group_max_size": 1,
"eks_hosting_group_desired_size": 0,
"eks_service_group_instance_type": "t3.large",
"eks_service_group_min_size": 3,
"eks_service_group_max_size": 10,
"eks_service_group_desired_size": 3,
"mongodb_instance_type": "t3.medium",
"mongodb_allocated_storage": 10,
"bastion_instance_type": "t3.micro",
Expand Down
22 changes: 11 additions & 11 deletions doc/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,7 @@ AWS Elastics Kubernetes Service Configuration Options
**BE AWARE:** there are far-reaching and often times irreversible consequences to changing this value.
DO NOT change this value unless you're certain that you understand whaat you're doing.

*default value: 1.24*
*default value: 1.25*

- **eks_create_kms_key:**

Expand All @@ -285,15 +285,15 @@ their rightful 'owner' and can be called back into service by the 'owner' at any
for you, immediately replacing any node that is called back by its owner. This happens infrequently, with the exception of the eu-west-2 (London)
AWS data center.

- **eks_karpenter_group_instance_type:**
- **eks_service_group_instance_type:**

The *preferred* instance type that Karpenter will acquire on your behalf from the spot-price marketplace. Note
that the Terraform scripts include several fallback options in the event that your preferred instance type is not
available.

*default value: t3.large*

- **eks_karpenter_group_min_size:**
- **eks_service_group_min_size:**

The minimum number of EC2 instance compute nodes to maintain inside the compute plane of your cluster. This value
needs to be at least 1 in order for Karpenter to gather real-time load and performance metrics that it uses
Expand All @@ -303,7 +303,7 @@ AWS data center.

*default value: 3*

- **eks_karpenter_group_max_size:**
- **eks_service_group_max_size:**

The maximum number of EC2 instances that Karpenter is permitted to add to the Kubernetes compute plane
regardless of real-time load metrics.
Expand All @@ -314,35 +314,35 @@ AWS data center.

*default value: 10*

- **eks_karpenter_group_desired_size:**
- **eks_service_group_desired_size:**

The initial setting that Karpenter will use when the EKS cluster is created and initialized.
This value will poentially change (higher or lower) as soon as metrics-server and promethus
service begin reporting performance metrics to Karpenter.

*default value: 3*

eks_worker_group is an optional, supplemental EC2 node worker group that is included in the
eks_hosting_group is an optional, supplemental EC2 node worker group that is included in the
AWS EKS build. If you chose to install Karpenter then you can ignore these options.
Nodes created in this group will use on-demand pricing, which will cost around 3x as compared
to the Karpenter nodes, which use spot-pricing. However, availability of on-demand nodes is guaranteed by AWS.

- **eks_worker_group_min_size:**
- **eks_hosting_group_min_size:**
The minimum allowed quanity of nodes for this group.

*default value: 0*

- **eks_worker_group_max_size:**
- **eks_hosting_group_max_size:**
The maximum allowed quanity of nodes for this group.

*default value: 0*

- **eks_worker_group_desired_size:**
- **eks_hosting_group_desired_size:**
The current run-time requested quanity of nodes for this group.

*default value: 0*

- **eks_worker_group_instance_type:**
- **eks_hosting_group_instance_type:**
The AWS EC2 instance type that will be created for all nodes in this group.

*default value: t3.xlarge*
Expand Down Expand Up @@ -571,7 +571,7 @@ The scaffolding that is generated by Cookiecutter provides the code samples that
singular tool at your disposal for programatically administering your Kubernetes cluster. Your choice of `kubectl <https://kubernetes.io/docs/tasks/tools/>`_ version (and its installation method) have an equally significan impact to
the reliability of your deploy workflows.

*default value: 1.24/stable*
*default value: 1.25/stable*

- **ci_build_theme_repository:**

Expand Down
6 changes: 3 additions & 3 deletions {{cookiecutter.github_repo_name}}/doc/QUICKSTART.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,10 @@ Review your production environment parameters.
redis_node_type = "cache.t2.small"
# 2 vCPU 8gb
eks_worker_group_instance_type = "t3.large"
eks_hosting_group_instance_type = "t3.large"
# 2 vCPU 8gb
eks_karpenter_group_instance_type = "t3.large"
eks_service_group_instance_type = "t3.large"
}
Expand Down Expand Up @@ -225,7 +225,7 @@ Following is an example aws-auth configMap with additional IAM user accounts add
resourceVersion: "499488"
uid: 52d6e7fd-01b7-4c80-b831-b971507e5228
Note that by default, Kubernetes version 1.24 and newer encrypts all secrets data using `AWS Key Management Service (KMS) <https://aws.amazon.com/kms/>`_.
Note that by default, Kubernetes version 1.25 and newer encrypts all secrets data using `AWS Key Management Service (KMS) <https://aws.amazon.com/kms/>`_.
The Cookiecutter automatically adds the IAM user for the bastion server.
For any other IAM users you'll need to modify the following in terraform/stacks/modules/kubernetes/main.tf:

Expand Down
16 changes: 8 additions & 8 deletions {{cookiecutter.github_repo_name}}/make.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,14 @@ cookiecutter --checkout $GITHUB_BRANCH \
environment_name={{ cookiecutter.environment_name }} \
environment_subdomain={{ cookiecutter.environment_subdomain }} \
eks_create_kms_key={{ cookiecutter.eks_create_kms_key }} \
eks_worker_group_instance_type={{ cookiecutter.eks_worker_group_instance_type }} \
eks_worker_group_min_size={{ cookiecutter.eks_worker_group_min_size }} \
eks_worker_group_max_size={{ cookiecutter.eks_worker_group_max_size }} \
eks_worker_group_desired_size={{ cookiecutter.eks_worker_group_desired_size }} \
eks_karpenter_group_instance_type={{ cookiecutter.eks_karpenter_group_instance_type }} \
eks_karpenter_group_min_size={{ cookiecutter.eks_karpenter_group_min_size }} \
eks_karpenter_group_max_size={{ cookiecutter.eks_karpenter_group_max_size }} \
eks_karpenter_group_desired_size={{ cookiecutter.eks_karpenter_group_desired_size }} \
eks_hosting_group_instance_type={{ cookiecutter.eks_hosting_group_instance_type }} \
eks_hosting_group_min_size={{ cookiecutter.eks_hosting_group_min_size }} \
eks_hosting_group_max_size={{ cookiecutter.eks_hosting_group_max_size }} \
eks_hosting_group_desired_size={{ cookiecutter.eks_hosting_group_desired_size }} \
eks_service_group_instance_type={{ cookiecutter.eks_service_group_instance_type }} \
eks_service_group_min_size={{ cookiecutter.eks_service_group_min_size }} \
eks_service_group_max_size={{ cookiecutter.eks_service_group_max_size }} \
eks_service_group_desired_size={{ cookiecutter.eks_service_group_desired_size }} \
mysql_instance_class={{ cookiecutter.mysql_instance_class }} \
mysql_allocated_storage={{ cookiecutter.mysql_allocated_storage }} \
redis_node_type={{ cookiecutter.redis_node_type }} \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -194,8 +194,7 @@ resource "kubernetes_secret" "credentials" {
MYSQL_PORT = data.kubernetes_secret.mysql_root.data.MYSQL_PORT
}
}
{% endif %}
{% if cookiecutter.ci_deploy_install_license_manager_service|upper == "Y" -%}
{% endif %}{% if cookiecutter.ci_deploy_install_license_manager_service|upper == "Y" -%}
resource "random_password" "mysql_license_manager" {
length = 16
special = true
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ affinity:
- weight: 1
preference:
matchExpressions:
- key: application-group
- key: node-group
operator: In
values:
- wordpress
Expand All @@ -82,6 +82,11 @@ nodeSelector: {}
## @param tolerations Tolerations for pod assignment
## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
##
tolerations:
- effect: NoSchedule
key: role
operator: Equal
value: pvc-pods
## WordPress containers' resource requests and limits
## ref: https://kubernetes.io/docs/user-guide/compute-resources/
## @param resources.limits The resources limits for the WordPress containers
Expand All @@ -101,7 +106,7 @@ containerPorts:
ingress:
enabled: false
persistence:
size: ${persistenceSize}
existingClaim: "${wordpressDomain}"
serviceAccount:
create: ${serviceAccountCreate}
name: ${serviceAccountName}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
#------------------------------------------------------------------------------
# written by: Lawrence McDaniel
# https://lawrencemcdaniel.com/
#
# date: Feb-2023
#
# usage: create a detachable EBS volume to be used as the PVC for the Wordpress pod.
#
# Problems we're trying to solve: the Bitnami Wordpress chart provides
# dynamic PVC and volume management by default, but there are shortcomings:
# 1. the EBS drive volume gets destroyed whenever we run Terraform destroy on
# a Wordpress site, which is usually **not** what we want.
#
# 2. the EBS volumes are generically named and tagged. we'd prefer to see
# identifying information that helps us understand which EBS volume belongs
# to which Wordpress site.
#
# 3. The Bitnami charts lacks granular control over the design attributes of the
# EBS volume, the PV and the PVC. We want to maintain the potential to fine
# tune these in the future.
#------------------------------------------------------------------------------

#------------------------------------------------------------------------------
# KUBERNETES RESOURCES
#------------------------------------------------------------------------------
resource "kubernetes_persistent_volume_claim" "wordpress" {
metadata {
name = local.wordpressDomain
namespace = local.wordpressNamespace
labels = {
"ebs_volume_id" = "${aws_ebs_volume.wordpress.id}"
"name" = "${local.wordpressDomain}"
"namespace" = "${local.wordpressNamespace}"
}
}

spec {
access_modes = ["ReadWriteOnce"]
storage_class_name = "gp2"
resources {
requests = {
storage = "${local.persistenceSize / 2}Gi"
}
}
volume_name = kubernetes_persistent_volume.wordpress.metadata.0.name
}

depends_on = [
kubernetes_persistent_volume.wordpress
]
}

resource "kubernetes_persistent_volume" "wordpress" {
metadata {
name = local.wordpressDomain
labels = {
"topology.kubernetes.io/region" = "${var.aws_region}"
"topology.kubernetes.io/zone" = "${aws_ebs_volume.wordpress.availability_zone}"
"ebs_volume_id" = "${aws_ebs_volume.wordpress.id}"
"name" = "${local.wordpressDomain}"
"namespace" = "${local.wordpressNamespace}"
}
annotations = {
}
}

spec {
capacity = {
storage = "${local.persistenceSize}Gi"
}
access_modes = ["ReadWriteOnce"]
storage_class_name = "gp2"
persistent_volume_source {
aws_elastic_block_store {
volume_id = aws_ebs_volume.wordpress.id
fs_type = "ext4"
}
}
node_affinity {
required {
node_selector_term {
match_expressions {
key = "topology.kubernetes.io/zone"
operator = "In"
values = ["${aws_ebs_volume.wordpress.availability_zone}"]
}
match_expressions {
key = "topology.kubernetes.io/region"
operator = "In"
values = ["${var.aws_region}"]
}
}
}
}
}

depends_on = [
aws_ebs_volume.wordpress
]
}

# create a detachable EBS volume for the wordpress databases
#------------------------------------------------------------------------------
# AWS ELASTIC BLOCK STORE RESOURCES
#------------------------------------------------------------------------------
resource "aws_ebs_volume" "wordpress" {
availability_zone = data.aws_subnet.private_subnet.availability_zone
size = local.persistenceSize
tags = var.tags

# local.ebsVolumePreventDestroy defaults to 'Y'
# for anything other than an upper case 'N' we'll assume that
# we should not destroy this resource.
lifecycle {
prevent_destroy = false
}

depends_on = [
data.aws_subnet.private_subnet
]
}


#------------------------------------------------------------------------------
# SUPPORTING RESOURCES
#------------------------------------------------------------------------------

data "aws_subnet" "private_subnet" {
id = var.subnet_ids[random_integer.subnet_id.result]
}

# randomize the choice of subnet. Each of the
# possible subnets corresponds to the AWS availability
# zones in the data center. Most data center have three
# availability zones, but some like us-east-1 have more than
# three.
resource "random_integer" "subnet_id" {
min = 0
max = length(var.subnet_ids) - 1
}
Loading

0 comments on commit 0abb55e

Please sign in to comment.