Skip to content

Commit

Permalink
Merge pull request #1454 from pbk8s/main
Browse files Browse the repository at this point in the history
Sentiment Analysis Learning Path
  • Loading branch information
jasonrandrews authored Jan 2, 2025
2 parents 313aafc + 2093816 commit 3e691fb
Show file tree
Hide file tree
Showing 15 changed files with 427 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
title: Cluster monitoring with Prometheus and Grafana in Amazon EKS
weight: 5

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## CPU and RAM usage statistics with Prometheus and Grafana

Prometheus is a monitoring and alerting tool. It is used for collecting and querying real-time metrics in cloud-native environments like Kubernetes. Prometheus collects essential metrics (e.g., CPU, memory usage, pod counts, request latency) that help in monitoring the health and performance of Kubernetes clusters. Grafana is a visualization and analytics tool that integrates with data sources from Prometheus, to create interactive dashboards to monitor and analyze Kubernetes metrics over time.


## Install Prometheus on Arm-based EKS cluster

This learning path uses `helm` to install prometheus on the Kubernetes cluster. Follow the [helm documentation](https://helm.sh/docs/intro/install/) to install it on your laptop.

Create a namespace in your EKS cluster to host `prometheus` pods

```console
kubectl create namespace prometheus
```

Add the following helm repo for prometheus

```console
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
```

Install `prometheus` on the cluster with the following command

```console
helm install prometheus prometheus-community/prometheus \
--namespace prometheus \
--set alertmanager.persistentVolume.storageClass="gp2" \
--set server.persistentVolume.storageClass="gp2"
```

Check all pods are up and running

```console
kubectl get pods -n prometheus
```


## Install Grafana on Arm-based EKS cluster

Add the following helm repo for grafana

```console
helm repo add grafana https://grafana.github.io/helm-charts
```

Create `grafana.yaml` file with the following contents

```console
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-server.prometheus.svc.cluster.local
access: proxy
isDefault: true
```

Create another namespace for `grafana` pods

```console
kubectl create namespace grafana
```

Install `grafana` on the cluster with the following command

```console
helm install grafana grafana/grafana \
--namespace grafana \
--set persistence.storageClassName="gp2" \
--set persistence.enabled=true \
--set adminPassword=‘kubegrafana’ \
--values grafana.yaml \
--set service.type=LoadBalancer
```
Check all pods are up and running

```console
kubectl get pods -n grafana
```

Login to the grafana dashboard using the LoadBalancer IP and click on `Dashboards` in the left navigation page. Locate a `Kubernetes / Compute Resources / Node` dashboard and click on it. You should see a dashboard like below for your Kubernetes cluster

![grafana #center](_images/grafana.png)
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
title: Monitoring the sentiments with Elasticsearch and Kibana
weight: 4

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Deploy Elasticsearch and Kibana on Arm-based EC2 instance

Elasticsearch is a NoSQL database and search & analytics engine. It's designed to store, search and analyze large amounts of data. It has real-time indexing capability which is crucial for handling high-velocity data streams like tweets. Kibana is a dashboard and visualization tool that integrates seamlessly with Elasticsearch. It provides an interface to interact with twitter data, apply filters and receive alerts. There are multiple ways to install Elasticsearch and Kibana, one of the methods is shown below.

Before you begin, ensure that docker and docker compose have been installed on your laptop.

Create the following docker-compose.yml file

```yml
version: '2.18.1'
services:
elasticsearch:
image: elasticsearch:8.15.2
container_name: elasticsearch
environment:
- discovery.type=single-node
- ES_JAVA_OPTS=-Xms512m -Xmx512m
- xpack.security.enabled=false
- HTTP_ENABLE=true
ports:
- "9200:9200"
networks:
- elk

kibana:
image: kibana:8.15.2
container_name: kibana
ports:
- "5601:5601"
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
- HTTP_ENABLE=true
depends_on:
- elasticsearch
networks:
- elk

networks:
elk:
driver: bridge
```
Use the following command to deploy Elasticsearch and Kibana Dashboard.
docker-compose up
After the dashboard is up, use the the public IP of your server on the port 5601 to access the Kibana dashboard.
![kibana #center](_images/kibana.png)
Now switch to the stack management using the menu on the left side as shown in below image.
![kibana-data #center](_images/Kibana-data.png)
To make sure that you are receiving the data from sentiment analysis application through Elasticsearch, check whether you have Data View in Stack Management.
![kibana-sentiment #center](_images/Kibana-sentiment.png)
You can also check the types of attributes that are received as the Data Views. Now, you can switch to the dashboards on the left menu and start creating the visualizations to analyze the data.
![kibana-dashboard1 #center](_images/Kibana-dashboard1.png)
One of the sample dashboard structures looks as below, showing the records of different sentiments.
![kibana-dashboard2 #center](_images/Kibana-dashboard2.png)
Similarly, you can desgin and create dashboards to analyze a particular set of data. The screenshot below shows the dashboard designed for this learning path
![kibana-dashboard3 #center](_images/Kibana-dashboard3.png)
Navigate to the `dashboards` directory in the cloned github repository and locate `sentiment_dashboard.ndjson` file. Import this file into Kibana dashboard and you should see a dashboard shown in previous step.
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
title: Cluster monitoring with Prometheus and Grafana in Amazon EKS
weight: 3

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Before you begin

You will need an [AWS account](https://aws.amazon.com/). Create an account if needed.

Three tools are required on your local machine. Follow the links to install the required tools.

* [Kubectl](/install-guides/kubectl/)
* [AWS CLI](/install-guides/aws-cli)
* [Docker](/install-guides/docker)
* [Terraform](/install-guides/terraform)

## Setup sentiment analysis

Clone this github [repository](https://github.com/koleini/spark-sentiment-analysis) on your local workstation. Navigate to `eks` directory and update the `variables.tf` file with your AWS region.

Execute the following commands to create the Amazon EKS cluster with pre-configured labels.

```console
terraform init
terraform apply --auto-approve
```

Update the `kubeconfig` file to access the deployed EKS cluster with the following command:

```console
aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw cluster_name) --profile <AWS_PROFILE_NAME>
```

Create a service account for Apache spark

```console
kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
```

## Build the sentiment analysis JAR file

Navigate to the `sentiment_analysis` folder and create a JAR file for the sentiment analyzer

```console
cd sentiment_analysis
sbt assembly
```

You should see a JAR file created at the following location

```console
sentiment_analysis/target/scala-2.13/bigdata-assembly-0.1.jar
```

## Create Spark docker container image

Create a repository in Amazon ECR to store the docker images. You can also use Docker Hub.

The Spark repository contains a script to build the Docker image needed for running inside the Kubernetes cluster. Execute this script on your Arm-based laptop to build the arm64 image.

In the current working directory, clone the `apache spark` github repository prior to building the image

```console
git clone https://github.com/apache/spark.git
cd spark
git checkout v3.4.3
```
Build the docker container using the following commands:

```console
cp ../sentiment_analysis/target/scala-2.13/bigdata-assembly-0.1.jar jars/
bin/docker-image-tool.sh -r <your-docker-repository> -t sentiment-analysis build
bin/docker-image-tool.sh -r <your-docker-repository> -t sentiment-analysis push
```
## Run Spark computation on the cluster

Execute the `spark-submit` command within the Spark folder to deploy the application. The following commands will run the application with two executors, each with 12 cores, and allocate 24GB of memory for both the executors and driver pods.

Set the following variables before executing the `spark-submit` command

```console
export MASTER_ADDRESS=<K8S_MASTER_ADDRESS>
export ES_ADDRESS=<IP_ADDRESS_OF_ELASTICS_SEARCH>
export CHECKPOINT_BUCKET=<BUCKET_NAME>
export EKS_ADDRESS=<EKS_REGISTERY_ADDRESS>
```
Execute the following command

```console
bin/spark-submit \
--class bigdata.SentimentAnalysis \
--master k8s://$MASTER_ADDRESS:443 \
--deploy-mode cluster \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=532275579171.dkr.ecr.us-east-1.amazonaws.com/spark:sentiment-analysis \
--conf spark.kubernetes.driver.pod.name="spark-twitter" \
--conf spark.kubernetes.namespace=default \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.driver.extraJavaOptions="-DES_NODES=4$ES_ADDRESS -DCHECKPOINT_LOCATION=s3a://$CHECKPOINT_BUCKET/checkpoints/" \
--conf spark.executor.extraJavaOptions="-DES_NODES=$ES_ADDRESS -DCHECKPOINT_LOCATION=s3a://$CHECKPOINT_BUCKET/checkpoints/" \
--conf spark.executor.cores=12 \
--conf spark.driver.cores=12 \
--conf spark.driver.memory=24g \
--conf spark.executor.memory=24g \
--conf spark.memory.fraction=0.8 \
--name sparkTwitter \
local:///opt/spark/jars/bigdata-assembly-0.1.jar
```

Use `kubectl get pods` to check the status of the pods in the cluster.

```output
NAME READY STATUS RESTARTS AGE
sentimentanalysis-346f22932b484903-exec-1 1/1 Running 0 10m
sentimentanalysis-346f22932b484903-exec-2 1/1 Running 0 10m
spark-twitter 1/1 Running 0 12m
```

## Twitter sentiment analysis

Create a twitter(X) [developer account](https://developer.x.com/en/docs/x-api/getting-started/getting-access-to-the-x-api) and create a `bearer token`. Using the following script to fetch the tweets

```console
export BEARER_TOKEN=<BEARER_TOKEN_FROM_X>
python3 scripts/xapi_tweets.py
```

You can modify the script `xapi_tweets.py` with your own keywords. Update the following section in the script to do so

```console
query_params = {'query': "(#onArm OR @Arm OR #Arm OR #GenAI) -is:retweet lang:en",
'tweet.fields': 'lang'}
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: What is Twitter Sentiment Analysis
weight: 2

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## What is Sentiment Analysis

Sentiment analysis is a natural language processing technique used to identify and categorize opinions expressed in a piece of text, such as a tweet or a product review. It can help to gauge public opinion, identify trends and patterns, and improve decision-making. Social media platforms, such as Twitter, provide a wealth of information about public opinion, trends, and events. Sentiment analysis is important because it provides insights into how people feel about a particular topic or issue, and can help to identify emerging trends and patterns.


## Real-time sentiment analysis with Arm-based Amazon EKS clusters

Real-time sentiment analysis is a compute-intensive task and can quickly drive up resources and increase costs if not managed effectively. Tracking real-time changes enables organizations to understand sentiment patterns and make informed decisions promptly, allowing for timely and appropriate actions.

![sentiment analysis #center](_images/Sentiment-Analysis.png)

The high-level technology stack for the solutions is as follows:

- Twitter(X) Developer API to fetch tweets based on certain keywords
- Captured data is processed using Amazon Kinesis
- Sentiment Analyzer model to classify the text and tone of tweets
- Process the sentiment of tweets using Apache Spark streaming API
- Elasticsearch and Kibana to store the processed tweets and showcase on dashboard
- Prometheus and Grafana to monitor the CPU and RAM resources of the Amazon EKS cluster
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: Learn how to perform Twitter(X) Sentiment Analysis on Arm-based EKS clusters

minutes_to_complete: 60

who_is_this_for: This is an advanced topic for software developers who like to build an end-to-end solution ML solution to analyze the sentiments of live tweets with Arm-based Amazon EKS cluster

learning_objectives:
- Deploy text classification model on Amazon EKS with Apache Spark
- Learn how to deploy Elasticsearch and Kibana dashboard to analyze the tweets
- Deploy Prometheus and Grafana dashboard to keep track of CPU and RAM usage of Kubernetes nodes

prerequisites:
- An [AWS account](https://aws.amazon.com/). Create an account if needed.
- A computer with [Amazon eksctl CLI](/install-guides/eksctl) and [kubectl](/install-guides/kubectl/)installed.
- Docker installed on local computer [Docker](/install-guides/docker)

author_primary: Pranay Bakre, Masoud Koleini, Nobel Chowdary Mandepudi, Na Li

### Tags
skilllevels: Advanced
subjects: Containers and Virtualization
cloud_service_providers: AWS
armips:
- Neoverse
tools_software_languages:
- Kubernetes
- AWS Elastic Kubernetes Service (EKS)
operatingsystems:
- Linux


### FIXED, DO NOT MODIFY
# ================================================================================
weight: 1 # _index.md always has weight of 1 to order correctly
layout: "learningpathall" # All files under learning paths have this same wrapper
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
---
Loading

0 comments on commit 3e691fb

Please sign in to comment.