Skip to content

This repository contains an example of how to run Batch processes on OpenShift using Argo Workflows.

License

Notifications You must be signed in to change notification settings

alvarolop/spring-batch-capitalator

Repository files navigation

Exploring Spring Batch Framework

1. Introduction

This demo contains all you need to deploy Argo Workflows to coordinate the deployment and execution of batch processes created using Spring Batch. For that reason, this repository contains the following sections:

  • First, a quick mechanism to deploy ArgoCD. Argo CD automates the deployment of the desired application states in the specified target environments. We will use it to deploy Argo Workflows and our Spring Batch application.

  • Second, Argo Workflows. Argo Workflows is a container-native workflow engine for orchestrating parallel jobs on Kubernetes. We will use it to handle the execution of the batch processes.

  • Third, Spring Batch. Spring Batch is a processing framework designed for robust execution of jobs. We will use it to load data from a CSV file, process it, and store it in a Database deployed on OpenShift.

  • Forth, PostgreSQL database. PostgreSQL is a relational database management system (RDBMS) emphasizing extensibility and SQL compliance. We will use it later when we evolve the Spring Batch process to make it more similar to reality, modifying DB real data.

ℹ️
All of this will be deployed and automated on OpenShift using the GitOps principles. If you don’t have an OpenShift cluster, check this repo on how to deploy an OCP cluster on-premise.
💡
Data Processing is one of the typical use cases for Argo Workflows.

2. ArgoCD

There are many ways to install an ArgoCD instance. Among all of them, I’ve chosen to use Red Hat’s OpenShift GitOps. Red Hat OpenShift GitOps is based on the open source project Argo CD and provides a similar set of features to what the upstream offers, with additional automation, integration into Red Hat OpenShift Container Platform and the benefits of Red Hat’s enterprise support, quality assurance and focus on enterprise security.

The installation of OpenShift GitOps is based on an operator that is installed using the Operator Lifecycle Manager (OLM). To simplify the installation of the operator and the creation of the ArgoCD instance, I’ve created a simple script that you can run on a clean OCP cluster. This script will install all that you need:

  • ArgoCD instance with the correct RBAC.

  • Argo Workflows server and controller with RBAC as well as some namespaces to run the Workflows.

  • Argo WF workflows to run Spring Batch jobs on OpenShift.

  • A PostgreSQL instance for the future.

All-in-one script
./argocd/auto-install.sh

3. Argo Workflows

There are, also, many official ways to deploy Argo Workflows, all of them on this page of the documentation. Apart from the quick-start installation, there are two mechanisms: 1) the single-YAML release manifests and 2) the community-maintained Helm Charts. For this exercise, we are going to use the Helm Charts, as it provides more flexibility and adaptability.

From the basic Helm Chart provided in their GitHub repository, I’ve also added both a Namespace and a Route to make sure that the resources are not deployed in the default namespace.

I have created a modified values-*.yaml to customize the Argo Workflows behavior:

  • Provide HA configuration, based on the documentation and the Helm Chart example values.

  • Expose metrics and collect them using a ServiceMonitor according to the documentation. The Helm chart already takes care of that in these example values.

  • Limit the namespaces that the controller monitors to make sure that jobs are not executed in the wrong namespaces.

  • Configure the ArgoCD DEX as the SSO identity provider to inherit users and groups from OCP.

As there are some settings not included in the default Helm Chart, I have defined a wrapper chart (argo-workflows/argo-wf-config) that defines the official Helm Chart as a dependency. This new Helm Chart does, at least, the following:

  • Configures RBAC using a SA for each OCP group, and a RoleBinding or ClusterRoleBinding depending on the need.

  • A default namespace to install the server.

  • A Route to expose the server, the OCP way.

  • A ConsoleLink to simplify accessing the Argo Workflows cluster every time.

  • All the resources to deploy a local instance of the DEX server.

ℹ️
For general info, I recommend this guide about Argo Workflows Hardening.

The Argo Workflows server will be deployed and configured automatically once you execute the ./argocd/auto-install.sh command of the previous section. If you already have an ArgoCD that want to reuse, you can also execute the following command:

oc apply -f argocd/apps/application-argowf.yaml

The application is split into several Helm Charts for reusability and maintainability. The following diagram shows the dependencies:

GitOps Architecture
Figure 1. GitOps Architecture

4. Spring Batch application

The Spring Batch application dubbed Capitalator is just a demo application from the official guides, which has been adapted to the needs of this demo use case:

  • New name of the application.

  • Exposing metrics using micrometer (Still WIP).

  • Option to connect to an external PostgreSQL deployed in another OCP namespace, instead of in-memory HyperSQL DB) (Still WIP).

ℹ️
If you are not interested in manual push, just continue to the next section.

4.1. Manual I: Generate container image locally

A simple Dockerfile is stored in src/main/docker/Dockerfile.springboot-jar, and you can manually generate and push the container image to Quay with the following manual steps:

# Generate the Jar file with all the dependencies
mvn clean package

# Add the executable to a container image
podman build -f src/main/docker/Dockerfile.springboot-jar -t spring-boot/spring-batch-capitalator .

# Launch the application
podman run -i --rm spring-boot/spring-batch-capitalator

4.2. Manual II: Push image to Quay

Then, push the image to Quay using the following commands (Previously login to Quay with an authorized account):

# Tag the image to point to your Quay URL
podman tag spring-boot/spring-batch-capitalator quay.io/alopezme/spring-batch-capitalator

# Push image to Quay
podman push quay.io/alopezme/spring-batch-capitalator

4.3. Automated: GitHub Actions

As we don’t want to manually execute commands to generate and push a container image, I have automated the Build and Push process with a GH Workflow that is triggered every time a new commit is pushed and affects one of the files of the application itself.

Also, if a new git tag in semver format is pushed, it will generate an extra image using that tag as a container tag to the Quay repo.

For this to work, it is necessary to create a Robot Account in Quay with write permissions and create the following two secrets in the Git repository:

  • QUAY_REPO_TOKEN.

  • QUAY_REPO_USERNAME.

5. CronJobs: Deploy to OpenShift without Argo Workflows

If you don’t need any Batch Processing Orchestration, you can use an OpenShift CronJob that will execute the job periodically. For that, you have two options, still using ArgoCD to deploy the resources, or deploy them manually:

# ArgoCD application
oc apply -f argocd/apps/application-capitalator-cronjob.yaml

# Apply resources directly
oc apply -f capitalator-cronjob

6. PostgreSQL Database

To make the Spring Batch Capitalator example more similar to a real use case, this repository also provides a simple mechanism to deploy a postgresql database on a side namespace, so that Capitalator can connect and store the uppercase version of the names. You can deploy the DB either by creating an ArgoCD application or applying the resources directly:

# ArgoCD application
oc apply -f argocd/apps/application-postgresql.yaml

# Apply resources directly
oc apply -f db-postgresql/

7. Argo Workflows Architecture

7.1. RBAC

7.2. Metrics

ℹ️
All the metrics from Argo WF are prefixed by argo_workflows_.

Argo WF emits a certain number of controller metrics that inform on the state of the controller at any given time. Then you can configure a Grafana Dashboard to easily control those. I already enabled metrics by default in the current values-argowf.yaml, so the official Helm Chart already creates a ServiceMonitor to collect metrics using the OCP Monitoring operator.

Then, regarding the metrics visualization, I’ve explored the available Dashboards in grafana.com and added these three to the chart:

💡
For a full comprehensive explanation of Argo Workflows metrics, you can check the official documentation.

Dex metrics

Also, we would like to evaluate and show metrics about the Dex IdP. It seems that the metrics exposed are not that extensive, but at least we can enable them for basic debugging:

7.3. REST API

ℹ️
To use the REST API, you need to enable Client authentication, which allows you to authenticate using a Bearer Token. This is already done with the default values.

In order to use the provided Swagger specification, you will need to follow three simple steps:

  1. You need to retrieve the URL with the openAPI spec. You can obtain it from here. As of today, this is the value: https://raw.githubusercontent.com/argoproj/argo-workflows/main/api/openapi-spec/swagger.json. Add it to a brand new Insomnia collection:

    Import OpenAPI collection
    Figure 2. Import OpenAPI collection
  2. Obtain the values from your cluster in order to configure the Swagger variables. You can run the following command:

    echo "bearer_token: $(oc get secret argowf-cluster-admins-token -n argo-workflows -o=jsonpath='{.data.token}' | base64 --decode )"
    echo "base_url: $(oc get routes argowf -n argo-workflows --template='https://{{.spec.host}}')"
    echo "namespace: <Here put the namespace where you normally execute the workflows>"
  3. Now, access the Environment configuration and paste the following JSON to your Swagger Env.

{
    "base_url": "<base_url>",
    "scheme": "http",
    "host": "localhost:2746",
    "namespace": "<namespace>",
    "DEFAULT_HEADERS": {
        "Authorization": "Bearer <bearer_token>"
    }
}

Try a REST API call and happy coding!! 🚀

7.4. API Events

To keep things simple, you can use the api/v1/workflows endpoint to create workflows, but there’s one endpoint that is specifically designed to create workflows via an API: api/v1/events. You should use this for most cases (including Jenkins):

  • It only allows you to create workflows from a WorkflowTemplate, so is more secure.

  • It allows you to parse the HTTP payload and use it as parameters.

  • It allows you to integrate with other systems without you having to change those systems.

  • Webhooks also support GitHub and GitLab, so you can trigger workflow from git actions.

To use this, you need to create a WorkflowTemplate and a workflow event binding:

A workflow event binding consists of:

  • An event selector that matches events.

  • A reference to a WorkflowTemplates using workflowTemplateRef.

  • Optional parameters.

Example of Event execution
curl -sH "Authorization: $ARGO_TOKEN" $ARGO_URL/api/v1/events/whalesay/- -d '{"message": "hello events"}'

About

This repository contains an example of how to run Batch processes on OpenShift using Argo Workflows.

Resources

License

Stars

Watchers

Forks

Packages

No packages published