Merge pull request #1454 from pbk8s/main

Sentiment Analysis Learning Path
ArmDeveloperEcosystem · Jan 2, 2025 · 3e691fb · 3e691fb
2 parents 313aafc + 2093816
commit 3e691fb
Show file tree

Hide file tree

Showing 15 changed files with 427 additions and 0 deletions.
diff --git a/...puting/sentiment-analysis-eks/Cluster monitoring with Prometheus and Grafana.md b/...puting/sentiment-analysis-eks/Cluster monitoring with Prometheus and Grafana.md
@@ -0,0 +1,93 @@
+---
+title: Cluster monitoring with Prometheus and Grafana in Amazon EKS
+weight: 5
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## CPU and RAM usage statistics with Prometheus and Grafana
+
+Prometheus is a monitoring and alerting tool. It is used for collecting and querying real-time metrics in cloud-native environments like Kubernetes. Prometheus collects essential metrics (e.g., CPU, memory usage, pod counts, request latency) that help in monitoring the health and performance of Kubernetes clusters. Grafana is a visualization and analytics tool that integrates with data sources from Prometheus, to create interactive dashboards to monitor and analyze Kubernetes metrics over time. 
+
+
+## Install Prometheus on Arm-based EKS cluster
+
+This learning path uses `helm` to install prometheus on the Kubernetes cluster. Follow the [helm documentation](https://helm.sh/docs/intro/install/) to install it on your laptop.
+
+Create a namespace in your EKS cluster to host `prometheus` pods
+
+```console
+kubectl create namespace prometheus
+```
+
+Add the following helm repo for prometheus
+
+```console
+helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+```
+
+Install `prometheus` on the cluster with the following command
+
+```console
+helm install prometheus prometheus-community/prometheus \
+  --namespace prometheus \
+  --set alertmanager.persistentVolume.storageClass="gp2" \
+  --set server.persistentVolume.storageClass="gp2"
+```
+
+Check all pods are up and running
+
+```console
+kubectl get pods -n prometheus
+```
+
+
+## Install Grafana on Arm-based EKS cluster
+
+Add the following helm repo for grafana
+
+```console
+helm repo add grafana https://grafana.github.io/helm-charts
+```
+
+Create `grafana.yaml` file with the following contents
+
+```console
+datasources:
+  datasources.yaml:
+    apiVersion: 1
+    datasources:
+    - name: Prometheus
+      type: prometheus
+      url: http://prometheus-server.prometheus.svc.cluster.local
+      access: proxy
+      isDefault: true
+```
+
+Create another namespace for `grafana` pods
+
+```console
+kubectl create namespace grafana
+```
+
+Install `grafana` on the cluster with the following command
+
+```console
+helm install grafana grafana/grafana \
+  --namespace grafana \
+  --set persistence.storageClassName="gp2" \
+  --set persistence.enabled=true \
+  --set adminPassword=‘kubegrafana’ \
+  --values grafana.yaml \
+  --set service.type=LoadBalancer
+```
+Check all pods are up and running
+
+```console
+kubectl get pods -n grafana
+```
+
+Login to the grafana dashboard using the LoadBalancer IP and click on `Dashboards` in the left navigation page. Locate a `Kubernetes / Compute Resources / Node` dashboard and click on it. You should see a dashboard like below for your Kubernetes cluster
+
+![grafana #center](_images/grafana.png)
diff --git a/...ud-computing/sentiment-analysis-eks/Monitoring with Elasticsearch and Kibana.md b/...ud-computing/sentiment-analysis-eks/Monitoring with Elasticsearch and Kibana.md
@@ -0,0 +1,78 @@
+---
+title: Monitoring the sentiments with Elasticsearch and Kibana
+weight: 4
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Deploy Elasticsearch and Kibana on Arm-based EC2 instance
+
+Elasticsearch is a NoSQL database and search & analytics engine. It's designed to store, search and analyze large amounts of data. It has real-time indexing capability which is crucial for handling high-velocity data streams like tweets. Kibana is a dashboard and visualization tool that integrates seamlessly with Elasticsearch. It provides an interface to interact with twitter data, apply filters and receive alerts. There are multiple ways to install Elasticsearch and Kibana, one of the methods is shown below.
+
+Before you begin, ensure that docker and docker compose have been installed on your laptop. 
+
+Create the following docker-compose.yml file
+
+```yml
+version: '2.18.1'
+services:
+  elasticsearch:
+    image: elasticsearch:8.15.2
+    container_name: elasticsearch
+    environment:
+      - discovery.type=single-node
+      - ES_JAVA_OPTS=-Xms512m -Xmx512m
+      - xpack.security.enabled=false
+      - HTTP_ENABLE=true
+    ports:
+      - "9200:9200"
+    networks:
+      - elk
+
+  kibana:
+    image: kibana:8.15.2
+    container_name: kibana
+    ports:
+      - "5601:5601"
+    environment:
+      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
+      - HTTP_ENABLE=true
+    depends_on:
+      - elasticsearch
+    networks:
+      - elk
+
+networks:
+  elk:
+    driver: bridge
+```
+Use the following command to deploy Elasticsearch and Kibana Dashboard.
+
+docker-compose up
+
+After the dashboard is up, use the the public IP of your server on the port 5601 to access the Kibana dashboard.
+
+![kibana #center](_images/kibana.png)
+
+Now switch to the stack management using the menu on the left side as shown in below image.
+
+![kibana-data #center](_images/Kibana-data.png)
+
+To make sure that you are receiving the data from sentiment analysis application through Elasticsearch, check whether you have Data View in Stack Management.
+
+![kibana-sentiment #center](_images/Kibana-sentiment.png)
+
+You can also check the types of attributes that are received as the Data Views. Now, you can switch to the dashboards on the left menu and start creating the visualizations to analyze the data.
+
+![kibana-dashboard1 #center](_images/Kibana-dashboard1.png)
+
+One of the sample dashboard structures looks as below, showing the records of different sentiments.
+
+![kibana-dashboard2 #center](_images/Kibana-dashboard2.png)
+
+Similarly, you can desgin and create dashboards to analyze a particular set of data. The screenshot below shows the dashboard designed for this learning path
+
+![kibana-dashboard3 #center](_images/Kibana-dashboard3.png)
+
+Navigate to the `dashboards` directory in the cloned github repository and locate `sentiment_dashboard.ndjson` file. Import this file into Kibana dashboard and you should see a dashboard shown in previous step.
diff --git a/...-paths/servers-and-cloud-computing/sentiment-analysis-eks/Sentiment Analysis.md b/...-paths/servers-and-cloud-computing/sentiment-analysis-eks/Sentiment Analysis.md
@@ -0,0 +1,137 @@
+---
+title: Cluster monitoring with Prometheus and Grafana in Amazon EKS
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Before you begin
+
+You will need an [AWS account](https://aws.amazon.com/). Create an account if needed. 
+
+Three tools are required on your local machine. Follow the links to install the required tools.
+
+* [Kubectl](/install-guides/kubectl/)
+* [AWS CLI](/install-guides/aws-cli)
+* [Docker](/install-guides/docker)
+* [Terraform](/install-guides/terraform)
+
+## Setup sentiment analysis
+
+Clone this github [repository](https://github.com/koleini/spark-sentiment-analysis) on your local workstation. Navigate to `eks` directory and update the `variables.tf` file with your AWS region.
+
+Execute the following commands to create the Amazon EKS cluster with pre-configured labels.
+
+```console
+terraform init
+terraform apply --auto-approve
+```
+
+Update the `kubeconfig` file to access the deployed EKS cluster with the following command:
+
+```console
+aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw cluster_name) --profile <AWS_PROFILE_NAME>
+```
+
+Create a service account for Apache spark
+
+```console
+kubectl create serviceaccount spark
+kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
+```
+
+## Build the sentiment analysis JAR file
+
+Navigate to the `sentiment_analysis` folder and create a JAR file for the sentiment analyzer
+
+```console
+cd sentiment_analysis
+sbt assembly
+```
+
+You should see a JAR file created at the following location
+
+```console
+sentiment_analysis/target/scala-2.13/bigdata-assembly-0.1.jar
+```
+
+## Create Spark docker container image
+
+Create a repository in Amazon ECR to store the docker images. You can also use Docker Hub.
+
+The Spark repository contains a script to build the Docker image needed for running inside the Kubernetes cluster. Execute this script on your Arm-based laptop to build the arm64 image.
+
+In the current working directory, clone the `apache spark` github repository prior to building the image
+
+```console
+git clone https://github.com/apache/spark.git
+cd spark
+git checkout v3.4.3
+```
+Build the docker container using the following commands:
+
+```console
+cp ../sentiment_analysis/target/scala-2.13/bigdata-assembly-0.1.jar jars/
+bin/docker-image-tool.sh -r <your-docker-repository> -t sentiment-analysis build
+bin/docker-image-tool.sh -r <your-docker-repository> -t sentiment-analysis push
+```
+## Run Spark computation on the cluster
+
+Execute the `spark-submit` command within the Spark folder to deploy the application. The following commands will run the application with two executors, each with 12 cores, and allocate 24GB of memory for both the executors and driver pods.
+
+Set the following variables before executing the `spark-submit` command
+
+```console
+export MASTER_ADDRESS=<K8S_MASTER_ADDRESS>
+export ES_ADDRESS=<IP_ADDRESS_OF_ELASTICS_SEARCH>
+export CHECKPOINT_BUCKET=<BUCKET_NAME>
+export EKS_ADDRESS=<EKS_REGISTERY_ADDRESS>
+```
+Execute the following command
+
+```console
+bin/spark-submit \
+      --class bigdata.SentimentAnalysis \
+      --master k8s://$MASTER_ADDRESS:443 \
+      --deploy-mode cluster \
+      --conf spark.executor.instances=2 \
+      --conf spark.kubernetes.container.image=532275579171.dkr.ecr.us-east-1.amazonaws.com/spark:sentiment-analysis \
+      --conf spark.kubernetes.driver.pod.name="spark-twitter" \
+      --conf spark.kubernetes.namespace=default \
+      --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
+      --conf spark.driver.extraJavaOptions="-DES_NODES=4$ES_ADDRESS -DCHECKPOINT_LOCATION=s3a://$CHECKPOINT_BUCKET/checkpoints/" \
+      --conf spark.executor.extraJavaOptions="-DES_NODES=$ES_ADDRESS -DCHECKPOINT_LOCATION=s3a://$CHECKPOINT_BUCKET/checkpoints/" \
+      --conf spark.executor.cores=12 \
+      --conf spark.driver.cores=12 \
+      --conf spark.driver.memory=24g \
+      --conf spark.executor.memory=24g \
+      --conf spark.memory.fraction=0.8 \
+      --name sparkTwitter \
+      local:///opt/spark/jars/bigdata-assembly-0.1.jar
+```
+
+Use `kubectl get pods` to check the status of the pods in the cluster.
+
+```output
+NAME                                        READY   STATUS    RESTARTS   AGE
+sentimentanalysis-346f22932b484903-exec-1   1/1     Running   0          10m
+sentimentanalysis-346f22932b484903-exec-2   1/1     Running   0          10m
+spark-twitter                               1/1     Running   0          12m
+```
+
+## Twitter sentiment analysis
+
+Create a twitter(X) [developer account](https://developer.x.com/en/docs/x-api/getting-started/getting-access-to-the-x-api) and create a `bearer token`. Using the following script to fetch the tweets
+
+```console
+export BEARER_TOKEN=<BEARER_TOKEN_FROM_X>
+python3 scripts/xapi_tweets.py
+```
+
+You can modify the script `xapi_tweets.py` with your own keywords. Update the following section in the script to do so
+
+```console
+query_params = {'query': "(#onArm OR @Arm OR #Arm OR #GenAI) -is:retweet lang:en",
+                'tweet.fields': 'lang'}
+```
diff --git a/...ths/servers-and-cloud-computing/sentiment-analysis-eks/Understand the basics.md b/...ths/servers-and-cloud-computing/sentiment-analysis-eks/Understand the basics.md
@@ -0,0 +1,27 @@
+---
+title: What is Twitter Sentiment Analysis
+weight: 2
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## What is Sentiment Analysis
+
+Sentiment analysis is a natural language processing technique used to identify and categorize opinions expressed in a piece of text, such as a tweet or a product review. It can help to gauge public opinion, identify trends and patterns, and improve decision-making. Social media platforms, such as Twitter, provide a wealth of information about public opinion, trends, and events. Sentiment analysis is important because it provides insights into how people feel about a particular topic or issue, and can help to identify emerging trends and patterns.
+
+
+## Real-time sentiment analysis with Arm-based Amazon EKS clusters
+
+Real-time sentiment analysis is a compute-intensive task and can quickly drive up resources and increase costs if not managed effectively. Tracking real-time changes enables organizations to understand sentiment patterns and make informed decisions promptly, allowing for timely and appropriate actions.
+
+![sentiment analysis #center](_images/Sentiment-Analysis.png)
+
+The high-level technology stack for the solutions is as follows:
+
+- Twitter(X) Developer API to fetch tweets based on certain keywords
+- Captured data is processed using Amazon Kinesis
+- Sentiment Analyzer model to classify the text and tone of tweets
+- Process the sentiment of tweets using Apache Spark streaming API
+- Elasticsearch and Kibana to store the processed tweets and showcase on dashboard
+- Prometheus and Grafana to monitor the CPU and RAM resources of the Amazon EKS cluster
diff --git a/...ervers-and-cloud-computing/sentiment-analysis-eks/_images/Kibana-dashboard1.png b/...ervers-and-cloud-computing/sentiment-analysis-eks/_images/Kibana-dashboard1.png
diff --git a/...ervers-and-cloud-computing/sentiment-analysis-eks/_images/Kibana-dashboard2.png b/...ervers-and-cloud-computing/sentiment-analysis-eks/_images/Kibana-dashboard2.png
diff --git a/...ervers-and-cloud-computing/sentiment-analysis-eks/_images/Kibana-dashboard3.png b/...ervers-and-cloud-computing/sentiment-analysis-eks/_images/Kibana-dashboard3.png
diff --git a/...aths/servers-and-cloud-computing/sentiment-analysis-eks/_images/Kibana-data.png b/...aths/servers-and-cloud-computing/sentiment-analysis-eks/_images/Kibana-data.png
diff --git a/...servers-and-cloud-computing/sentiment-analysis-eks/_images/Kibana-sentiment.png b/...servers-and-cloud-computing/sentiment-analysis-eks/_images/Kibana-sentiment.png
diff --git a/...rvers-and-cloud-computing/sentiment-analysis-eks/_images/Sentiment-Analysis.png b/...rvers-and-cloud-computing/sentiment-analysis-eks/_images/Sentiment-Analysis.png
diff --git a/...ng-paths/servers-and-cloud-computing/sentiment-analysis-eks/_images/grafana.png b/...ng-paths/servers-and-cloud-computing/sentiment-analysis-eks/_images/grafana.png
diff --git a/...ing-paths/servers-and-cloud-computing/sentiment-analysis-eks/_images/kibana.png b/...ing-paths/servers-and-cloud-computing/sentiment-analysis-eks/_images/kibana.png
diff --git a/...ent/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/_index.md b/...ent/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/_index.md
@@ -0,0 +1,38 @@
+---
+title: Learn how to perform Twitter(X) Sentiment Analysis on Arm-based EKS clusters
+
+minutes_to_complete: 60
+
+who_is_this_for: This is an advanced topic for software developers who like to build an end-to-end solution ML solution to analyze the sentiments of live tweets with Arm-based Amazon EKS cluster
+
+learning_objectives: 
+    - Deploy text classification model on Amazon EKS with Apache Spark
+    - Learn how to deploy Elasticsearch and Kibana dashboard to analyze the tweets
+    - Deploy Prometheus and Grafana dashboard to keep track of CPU and RAM usage of Kubernetes nodes
+
+prerequisites:
+    - An [AWS account](https://aws.amazon.com/). Create an account if needed.
+    - A computer with [Amazon eksctl CLI](/install-guides/eksctl) and [kubectl](/install-guides/kubectl/)installed.
+    - Docker installed on local computer [Docker](/install-guides/docker)
+
+author_primary: Pranay Bakre, Masoud Koleini, Nobel Chowdary Mandepudi, Na Li
+
+### Tags
+skilllevels: Advanced
+subjects: Containers and Virtualization
+cloud_service_providers: AWS
+armips:
+    - Neoverse
+tools_software_languages:
+    - Kubernetes
+    - AWS Elastic Kubernetes Service (EKS)
+operatingsystems:
+    - Linux
+
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1                       # _index.md always has weight of 1 to order correctly
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+learning_path_main_page: "yes"  # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---