Skip to content

Commit

Permalink
#23: Setup ci incl. AWS code build (#70)
Browse files Browse the repository at this point in the history
* Added cli output for pytest in AWS codebuild
* Fixed permissions for creating EC2 instance
* Updated developer guide
* Update changes file
* Replaced become: true by {{need_sudo}}
* Added -s to build script
* Replaced short cli options in build script by long ones
* Added file release_config.yml and updated develoepr guide

Co-authored-by: Torsten Kilias <tkilias@users.noreply.github.com>
  • Loading branch information
ckunki and tkilias committed Nov 23, 2023
1 parent 8a538ca commit 32e01c8
Show file tree
Hide file tree
Showing 7 changed files with 89 additions and 43 deletions.
5 changes: 4 additions & 1 deletion .github/workflows/check_ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,10 @@ jobs:

- name: Run pytest
run: >
poetry run pytest -o log_cli=true -o log_cli_level=INFO
poetry run pytest
--capture=no
--override-ini=log_cli=true
--override-ini=log_cli_level=INFO
test/unit
test/integration/test_create_dss_docker_image.py
env: # Set the secret as an env variable
Expand Down
6 changes: 4 additions & 2 deletions aws-code-build/ci/buildspec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,7 @@ phases:
commands:
- echo DSS_RUN_CI_TEST is "$DSS_RUN_CI_TEST" #supposed to be true
build:
commands:
- poetry run python3 -m pytest -s test/ci/test_ci*.py
commands: >
poetry run python3 -m pytest
-s -o log_cli=true -o log_cli_level=INFO
test/ci/test_ci*.py
2 changes: 2 additions & 0 deletions doc/changes/changes_0.1.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Version: 0.1.0
* #53: Moved Jupyter notebooks to folder visible to ansible
* #16: Installed Jupyter notebooks via ansible
* #67: Removed apt cache to reduce image size
* #23: Fixed AWS Code build

## Bug Fixes

Expand All @@ -32,6 +33,7 @@ Version: 0.1.0
* #5: Renamed all occurrences of "script language developer" by "data science"
* #56: Moved jupyter notebook files again
* #63: Improved logging of Ansible tasks
* #46: Enabled to suppress ansible output

## Documentation

Expand Down
112 changes: 74 additions & 38 deletions doc/developer_guide/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,50 +28,86 @@ bash install.sh
The Data Science Sandbox (DSS) uses AWS as backend, because it provides the possibility to run the whole workflow during a ci-test.

This project uses
- `boto3` to interact with AWS
- `pygithub` to interact with the Github releases
- `ansible-runner` to interact with Ansible.
* `boto3` to interact with AWS
* `pygithub` to interact with the Github releases
* `ansible-runner` to interact with Ansible.
Proxy classes to those projects are injected at the CLI layer. This allows to inject mock classes in the unit tests.
A CLI command has normally a respective function in the `lib` submodule. Hence, the CLI layer should not contain any logic, but invoke the respective library function only. Also, the proxy classes which abstract the dependant packages shall not contain too much logic. Ideally they should invoke only one function to the respective package.


## Commands

There are generally three types of commands:
The commands offered by the DSS CLI can be organized into three groups:

| Type | Explanation |
|----------------------|----------------------------------------------|
| Release Commands | used during the release |
| Deployment Commands | used to deploy infrastructure onto AWS cloud |
| Development Commands | used to identify problems or for testing |
| Group | Usage |
|----------------------|-----------------------------------------|
| Release Commands | during the release |
| Deployment Commands | to deploy infrastructure onto AWS cloud |
| Development Commands | to identify problems or for testing |

### Release commands

The following commands are used during the release AWS Codebuild job:
- `create-vm` - creates a new AMI and VM images
- `update-release` - updates release notes of an existing Github release
- `start-release-build` - starts the release on AWS codebuild
* `create-vm`: Create a new AMI and VM images.
* `update-release`: Update release notes of an existing Github release.
* `start-release-build`: Start the release on AWS codebuild.
* `create-docker-image`: Create a Docker image for data-science-sandbox and deploy it to hub.docker.com/exasol/data-science-sandbox.

Script `start-release-build`:
* Is usually called from github workflow `release_droid_upload_github_release_assets.yml`.
* Requires environment variable `GH_TOKEN` to contain a valid token for access to Github.
* Requires to specify CLI option `--upload-url`.

This operation usually takes around than 1:40 hours.

### Developer commands

All other commands provide a subset of the features of the release commands, and can be used to identify problems or simulate the release:
- `export-vm` - creates a new VM image from a running EC2-Instance
- `install-dependencies` - starts an ansible-installation onto an existing EC-2 instance
- `reset-password` - resets password on a remote EC-2-instance via ansible
- `setup-ec2` - starts a new EC2 instance (based on an Ubuntu AMI)
- `setup-ec2-and-install-dependencies` - starts a new EC2 instance and install dependencies via Ansible
- `show-aws-assets` - shows AWS entities associated with a specific keyword (called __asset-id__)
- `start-test-release` - starts a Test Release flow
- `make-ami-public` - Changes permissions of an existing AMI such that it becomes public
* `export-vm`: Create a new VM image from a running EC2-Instance.
* `install-dependencies`: Start an ansible-installation onto an existing EC-2 instance.
* `reset-password`: Reset password on a remote EC-2-instance via ansible.
* `setup-ec2`: Start a new EC2 instance (based on an Ubuntu AMI).
* `setup-ec2-and-install-dependencies`: Start a new EC2 instance and install dependencies via Ansible.
* The script will print the required SSH login for manual inspection or interaction with the EC2 instance.
* The instance is kept running until the user presses Ctrl-C.
* `show-aws-assets`: Show AWS entities associated with a specific keyword (called __asset-id__).
* `start-test-release-build`: (For testing) Creates a release on Github and forwards it to the AWS Codebuild which creates VM images in various formats and attaches them to the Github release.
* `make-ami-public`: Change permissions of an existing AMI such that it becomes public.

Script `start-test-release-build` requires environment variable `GH_TOKEN` to contain a valid token for access to Github.

### Deployment commands

The following commands can be used to deploy the infrastructure onto a given AWS account:
- `setup-ci-codebuild` - deploys the AWS Codebuild cloudformation stack which will run the ci-test
- `setup-vm-bucket` - deploys the AWS Bucket cloudformation stack which will be used to deploy the VM images
- `setup-release-codebuild` - deploys the AWS Codebuild cloudformation stack which will be used for the release-build
- `setup-vm-bucket-waf` - deploys the AWS Codebuild cloudformation stack which contains the WAF Acl configuration for the Cloudfront distribution of the VM Bucket
- `create-docker-image` - creates a Docker image for data-science-sandbox and deploys it to hub.docker.com/exasol/data-science-sandbox
* `setup-ci-codebuild`: Deploy the AWS Codebuild cloudformation stack which will run the ci-test.
* `setup-vm-bucket`: Deploy the AWS Bucket cloudformation stack which will be used to deploy the VM images.
* `setup-release-codebuild`: Deploy the AWS Codebuild cloudformation stack which will be used for the release-build.
* `setup-vm-bucket-waf`: Deploy the AWS Codebuild cloudformation stack which contains the WAF Acl configuration for the Cloudfront distribution of the VM Bucket.

For all deployment commands:
* Don't forget to specify CLI option `--aws-profile`.
* Ensure the related AWS stack does not exist. If there was a rollback then please delete the stack manually, otherwise the script will fail.

If `setup-release-codebuild` or `setup-ci-codebuild` fails with error message "_Failed to create webhook. Repository not found or permission denied._" then
* Ensure to grant sufficient access permissions to the Github user used by the script.
* You can use a Github "_Repository role_" for that.
* The repository role must include the following permissions
* Inherit the permissions from default role "Write"
* Additional repository permission "Manage webhooks"
* In AWS you can configure the Github token by a resource with logical ID `CodeBuildCredentials`
* Please note: There must be only one stack containing such a resource.
* The definition of the AWS resource `CodeBuildCredentials` can use credentials from tha AWS secret manager.

```yaml
Resources:
CodeBuildCredentials:
Type: AWS::CodeBuild::SourceCredential
Properties:
ServerType: GITHUB
AuthType: PERSONAL_ACCESS_TOKEN
Username: "{{resolve:secretsmanager:github_personal_token:SecretString:github_user_name}}"
Token: "{{resolve:secretsmanager:github_personal_token:SecretString:github_personal_token}}"
```
## Notebook Files
Expand All @@ -82,6 +118,7 @@ Please add or update the notebook files in folder [exasol/ds/sandbox/runtime/ans
## Flow
The following diagram shows the high-level steps to generate the images:
![image info](./img/create-vm-overview.drawio.png)
### Setup EC2
Expand All @@ -92,11 +129,11 @@ After the export has finished, the cloudformation stack and the keypair is remov
### Install
Installs all dependencies via Ansible:
- installs Poetry
- installs and configures Jupyter
- installs Docker and adds the user `ubuntu` to the docker group
- clones the script-languages-release repository
- changes the netplan configuration. This is necessary to have proper network configuration when running the VM image
* installs Poetry
* installs and configures Jupyter
* installs Docker and adds the user `ubuntu` to the docker group
* clones the script-languages-release repository
* changes the netplan configuration. This is necessary to have proper network configuration when running the VM image

Finally, the default password will be set, and also the password will be marked as expired, such that the user will be forced to enter a new password during initial login.
Also, the ssh password authentication will be enabled, and for security reasons the folder "~/.ssh" will be removed.
Expand Down Expand Up @@ -172,18 +209,17 @@ The export creates an AMI based on the running EC2 instance and exports the AMI
## Release

The release is executed in a AWS Codebuild job, the following diagram shows the flow.

![image info](./img/create-vm-release.drawio.png)

## AWS S3 Bucket

The bucket has private access. In order to control access, the Bucket cloudformation stack also contains a Cloudfront distribution. Public Https access is only possibly through Cloudfront. Another stack contains a Web application firewall (WAF), which will be used by the Cloudfront distribution. Due to restrictions in AWS, the WAF stack needs to be deployed in region "us-east-1". The WAF stack provides two rules which aim to minimize a possible bot attack:

| Name | Explanation | Priority |
|----------------------|-----------------------------------------------------------------------------------------|----------|
| VMBucketRateLimit | Declares the minimum possible rate limit for access: 100 requests in a 5 min interval. | 0 |
| CAPTCHA | Forces a captcha action for any IP which does not matcha predefined set of IP-addresses | 1 |


| Name | Explanation | Priority |
|----------------------|-------------------------------------------------------------------------------------------|----------|
| VMBucketRateLimit | Declares the minimum possible rate limit for access: 100 requests in a 5 min interval. | 0 |
| CAPTCHA | Forces a captcha action for any IP which does not match a predefined set of IP-addresses. | 1 |

## Involved Cloudformation stacks

Expand Down Expand Up @@ -214,8 +250,8 @@ The command `show-aws-assets` lists all assets which were created during the exe
## How to contribute

The project has two types of CI tests:
- unit tests and integration tests which run in a Github workflow
- A system test which runs on a AWS Codebuild
* unit tests and integration tests which run in a Github workflow
* A system test which runs on a AWS Codebuild

Both ci tests need to pass before the approval of a Github PR.
The Github workflow will run on each push to a branch in the Github repository. However, the AWS Codebuild will only run after you push a commit containing the string "[CodeBuild]" in the commit message.
2 changes: 2 additions & 0 deletions exasol/ds/sandbox/runtime/ansible/cleanup_tasks.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
- name: Ansible Clean
ansible.builtin.apt:
clean: yes
become: "{{need_sudo}}"
- name: Remove files in /var/lib/apt/lists/
ansible.builtin.file:
path: /var/lib/apt/lists/
state: absent
become: "{{need_sudo}}"
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,3 @@
ansible.builtin.synchronize:
src: "notebook/"
dest: "{{jupyterlab.notebook_folder}}"
rsync_opts:
- "--chmod=0644"
3 changes: 3 additions & 0 deletions release_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
release-platforms:
- GitHub
language: Generic

0 comments on commit 32e01c8

Please sign in to comment.