Skip to content

Commit

Permalink
Update the calibration workflow, clean up readme
Browse files Browse the repository at this point in the history
  • Loading branch information
dbarrous-navteca committed Jan 8, 2024
1 parent b096611 commit 7ea3b05
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 222 deletions.
16 changes: 14 additions & 2 deletions .github/workflows/calibration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,20 @@ jobs:

- name: Test Lambda Function with curl
run: |
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d @lambda_function/tests/test_data/test_eea_event.json
# Run curl and write the HTTP status code to a variable
HTTP_STATUS=$(curl -X POST "http://localhost:9000/2015-03-31/functions/function/invocations" \
-d @lambda_function/tests/test_data/test_eea_event.json \
-o response.json -w '%{http_code}')
# Check if the HTTP status is 200 (OK)
if [ "$HTTP_STATUS" -eq 200 ]; then
echo "Success: HTTP status is 200"
exit 0 # Exit with success
else
echo "Error or unexpected HTTP status: $HTTP_STATUS"
exit 1 # Exit with failure
fi

- name: Copy Processed Files from Container
run: |
container_id=$(docker ps -qf "ancestor=processing_function:latest")
Expand Down
60 changes: 5 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@
### **Base Image Used For Container:** https://github.com/HERMES-SOC/docker-lambda-base

### **Description**:
This repository is to define the image to be used for the SWSOC file processing Lambda function container. This container will be built and and stored in an ECR Repo.
The container will contain the latest release code as the production environment and the latest code on master as the development. Files with the appropriate naming convention will be handled in production while files prefixed with `dev_` will be handled using the development environment.
This repository is to define the image to be used for the SWSOC file processing Lambda function container. This container will be built and and stored in the appropriate development/production ECR Repo.

The container will contain the latest release code as the production environment and the latest code on master as the development.

### **Testing Locally (Using own Test Data)**:
1. Build the lambda container image (from within the lambda_function folder) you'd like to test:
Expand Down Expand Up @@ -35,56 +36,5 @@ The container will contain the latest release code as the production environment

`curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d @lambda_function/tests/test_data/test_eea_event.json`

# Information on working with a CDK Project

The `cdk.json` file tells the CDK Toolkit how to execute your app.

This project is set up like a standard Python project. The initialization
process also creates a virtualenv within this project, stored under the `.venv`
directory. To create the virtualenv it assumes that there is a `python3`
(or `python` for Windows) executable in your path with access to the `venv`
package. If for any reason the automatic creation of the virtualenv fails,
you can create the virtualenv manually.

To manually create a virtualenv on MacOS and Linux:

```
$ python3 -m venv .venv
```

After the init process completes and the virtualenv is created, you can use the following
step to activate your virtualenv.

```
$ source .venv/bin/activate
```

If you are a Windows platform, you would activate the virtualenv like this:

```
% .venv\Scripts\activate.bat
```

Once the virtualenv is activated, you can install the required dependencies.

```
$ pip install -r requirements.txt
```

At this point you can now synthesize the CloudFormation template for this code.

```
$ cdk synth
```

To add additional dependencies, for example other CDK libraries, just add
them to your `setup.py` file and rerun the `pip install -r requirements.txt`
command.

## Useful commands for CDK

* `cdk ls` list all stacks in the app
* `cdk synth` emits the synthesized CloudFormation template
* `cdk deploy` deploy this stack to your default AWS account/region
* `cdk diff` compare deployed stack with current state
* `cdk docs` open CDK documentation
### **How this Lambda Function is deployed**
This lambda function is part of the main SWxSOC Pipeline ([Architecture Repo Link](https://github.com/HERMES-SOC/sdc_aws_pipeline_architecture)). It is deployed via AWS Codebuild within that repository. It is first built and tagged within the appropriate production or development repository (depending if it is a release or commit). View the Codebuild CI/CD file [here](buildspec.yml).
1 change: 0 additions & 1 deletion lambda_function/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,4 @@ hermes_spani @ git+https://github.com/HERMES-SOC/hermes_spani.git
hermes_eea @ git+https://github.com/HERMES-SOC/hermes_eea.git
hermes_nemisis @ git+https://github.com/HERMES-SOC/hermes_nemisis.git
hermes_merit @ git+https://github.com/HERMES-SOC/hermes_merit.git
cdftracker @ git+https://github.com/HERMES-SOC/CDFTracker.git
psycopg2-binary==2.9.7
177 changes: 13 additions & 164 deletions lambda_function/src/file_processor/file_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,9 @@
get_instrument_bucket,
)
from sdc_aws_utils.aws import (
create_s3_client_session,
object_exists,
download_file_from_s3,
upload_file_to_s3,
create_s3_file_key,
parse_file_key,
get_science_file,
push_science_file,
)

# Configure logger
Expand Down Expand Up @@ -118,55 +116,32 @@ def _process_file(self) -> None:
)

# Parse file key to needed information
(
parsed_file_key,
this_instr,
destination_bucket,
) = self._parse_file(self.file_key, self.environment)
parsed_file_key = parse_file_key(self.file_key)

# Parse the science file name
science_file = science_filename_parser(parsed_file_key)
this_instr = science_file["instrument"]
destination_bucket = get_instrument_bucket(this_instr, self.environment)

# Download file from S3 or get local file path
file_path = self._get_file(
file_path = get_science_file(
self.instrument_bucket_name,
self.file_key,
parsed_file_key,
self.dry_run,
)

# Calibrate/Process file with Instrument Package
calibrated_filename = self._calibrate_file(this_instr, file_path, self.dry_run)

# Push file to S3 Bucket
self._put_file(
push_science_file(
science_filename_parser,
destination_bucket,
calibrated_filename,
self.dry_run,
)

@staticmethod
def _parse_file(file_key, environment):
"""
Parses the file key to extract the instrument name,
and determines the destination bucket based on the instrument and environment.
:param file_key: The key of the file in the S3 bucket.
:type file_key: str
:param environment: The current running environment (e.g., DEVELOPMENT).
:type environment: str
:return: A tuple containing key, instrument and bucket.
:rtype: tuple
"""
# Parse file key to get instrument name
file_key_array = file_key.split("/")
parsed_file_key = file_key_array[-1]

# Parse the science file name
science_file = science_filename_parser(parsed_file_key)
this_instr = science_file["instrument"]
destination_bucket = get_instrument_bucket(this_instr, environment)

return parsed_file_key, this_instr, destination_bucket

@staticmethod
def _calibrate_file(instrument, file_path, dry_run=False):
"""
Expand Down Expand Up @@ -234,130 +209,4 @@ def _calibrate_file(instrument, file_path, dry_run=False):
return calibrated_filename

except ValueError as e:
log.error(e)

@staticmethod
def _get_file(instrument_bucket_name, file_key, parsed_file_key, dry_run=False):
"""
Downloads the file from the specified S3 bucket, if not in a dry run.
If a file path is specified in the environment variables, it uses that instead.
:param instrument_bucket_name: The instrument bucket name.
:type instrument_bucket_name: str
:param file_key: The key of the file in the S3 bucket.
:type file_key: str
:param parsed_file_key: The parsed name of the file.
:type parsed_file_key: str
:param dry_run: Indicates whether the operation is a dry run.
:type dry_run: bool
:return: The path to the downloaded file or None if in a dry run.
:rtype: Path or None
"""
# Download file from instrument bucket if not a dry run
# or use the specified file path
if not dry_run:
# Check if using test data in instrument package
if os.getenv("USE_INSTRUMENT_TEST_DATA") == "True":
log.info("Using test data from instrument package")
return None

# Check if file path is specified in environment variables
if os.getenv("SDC_AWS_FILE_PATH"):
log.info(
"Using file path specified in environment variables"
f"{os.getenv('SDC_AWS_FILE_PATH')}"
)
file_path = Path(os.getenv("SDC_AWS_FILE_PATH"))
return file_path

# Initialize S3 Client
s3_client = create_s3_client_session()

# Verify object exists in instrument bucket
if not (
object_exists(
s3_client=s3_client,
bucket=instrument_bucket_name,
file_key=file_key,
)
or dry_run
):
raise FileNotFoundError(
f"File {file_key} does not exist in bucket {instrument_bucket_name}"
)

# Download file from S3 bucket if no file path is specified
file_path = download_file_from_s3(
s3_client,
instrument_bucket_name,
file_key,
parsed_file_key,
)

return file_path
else:
log.info("Dry Run - File will not be downloaded")
return None

@staticmethod
def _put_file(
science_filename_parser, destination_bucket, calibrated_filename, dry_run=False
):
"""
Uploads a file to the specified destination bucket in S3, if not in a dry run.
Generates the file key for the new file using the given parser.
:param science_filename_parser: The parser function to generate a file key.
:type science_filename_parser: function
:param destination_bucket: The name of the destination S3 bucket.
:type destination_bucket: str
:param calibrated_filename: The pathname of the new file to be uploaded.
:type calibrated_filename: str
:param dry_run: Indicates whether the operation is a dry run.
:type dry_run: bool
:return: The key of the newly uploaded file.
:rtype: str
"""
# Generate file key for new file
new_file_key = create_s3_file_key(science_filename_parser, calibrated_filename)

# Upload file to destination bucket if not a dry run
if dry_run:
log.info("Dry Run - File will not be uploaded")
return new_file_key

if os.getenv("USE_INSTRUMENT_TEST_DATA") == "True":
log.info("Using test data from instrument package")
return new_file_key

if not os.getenv("SDC_AWS_FILE_PATH"):
# Initialize S3 Client
s3_client = create_s3_client_session()

# Verify object does not exist in instrument bucket
if object_exists(
s3_client=s3_client,
bucket=destination_bucket,
file_key=new_file_key,
):
log.warning(
f"File {new_file_key} already exists in bucket {destination_bucket}"
)
return new_file_key

# Upload file to destination bucket
upload_file_to_s3(
s3_client=s3_client,
destination_bucket=destination_bucket,
filename=calibrated_filename,
file_key=new_file_key,
)

else:
log.info(
"File Processed Locally - File will not be uploaded,"
"available in mounted volume as:"
f"{Path(calibrated_filename).as_posix()}"
)

return new_file_key
log.error(e)

0 comments on commit 7ea3b05

Please sign in to comment.