Cell Tower Anomaly Detection using Serverless Spark through Google Cloud Composer

Goal - Detecting anomalous cell towers using network and customer data through serverless.

Following are the lab modules:

1. Understanding Data
2. Solution Diagram
3. Uploading DAG files to DAGs folder
4. Execution of Airflow DAG
5. BQ Output Tables
6. Logging

1. Understanding the data

The datasets used for this project are

1.telecom_customer_churn_data.csv
2.service_threshold_data.csv
3.customer_data

Telecom Customer Churn Data - This dataset contains information of services provided to the customers by the celltowers.
Service Threshold Data - This dataset contains the performance metrics thresold information of the celltowers.
Cust Raw Data - This is a folder which contains the files which are in parquet format and holds the information of the customer data.

2. Solution Diagram

Model Pipeline

The model pipeline involves the following steps:
- Create buckets in GCS
- Create Dataproc and Persistent History Server Cluster
- Copy the raw data files, PySpark and notebook files into GCS
- Create a Cloud Composer environment and Airflow jobs to run the serverless spark job
- Creating Google BigQuery tables with summary of anomalous cell towers

3. Uploading DAG files to DAGs folder

From the code repository, download the file located at: cell-tower-anomaly-detection>00-scripts>cell-tower-airflow.py
Rename file to <your_name_here>-cell-tower-airflow.py
Open the file and replace your name on row 21
Navigate to Composer><composer_environment>
Next, navigate to Environment Configuration>DAGs folder URI
Next, upload the DAG file to the GCS bucket corresponding to the DAGs folder URI

4. Execution of Airflow DAG

Navigate to Composer><your_environment>>Open Airflow UI

Once the Airflow UI opens, navigate to DAGs and open your respective DAG
Next, trigger your DAG by clicking on the Trigger DAG button

Once the DAG is triggered, the DAG can be monitored directly through the Airflow UI as well as the Dataproc>Serverless>Batches window

5. BQ output tables

Navigate to BigQuery Console, and check the cell_tower_anomaly_detection dataset.
Once the Airflow DAG execution is completed, two new tables '<your_name_here>_cell_tower_performance_data' and '<your_name_here>_customer_service_metrics_data' will be created:

To view the data in these tables -

Select the table from BigQuery Explorer by navigating 'project_id' > 'dataset' > 'table_name'
Click on the Preview button to see the data in the table

Note: If the Preview button is not visible, run the below queries to view the data. However, these queries will be charged for the full table scan.

  SELECT * FROM `<project_name>.<dataset_name>.<your_name_here>_cell_tower_performance_data` LIMIT 1000;
  SELECT * FROM `<project_name>.<dataset_name>.<your_name_here>_customer_service_metrics_data` LIMIT 1000;

Note: Edit all occurrences of <project_name> and <dataset_name> to match the values of the variables PROJECT_ID, and BQ_DATASET_NAME respectively

6. Logging

6.1 Airflow logging

To view the logs of any step of the DAG execution, click on the >Log button

6.2 Serverless Batch logs

Logs associated with the application can be found in the logging console under Dataproc > Serverless > Batches > <batch_name>.
You can also click on “View Logs” button on the Dataproc batches monitoring page to get to the logging page for the specific Spark job.

6.3 Persistent History Server logs

To view the Persistent History server logs, click the 'View History Server' button on the Dataproc batches monitoring page and the logs will be shown as below:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

06c-cell-tower-anomaly-airflow-execution.md

06c-cell-tower-anomaly-airflow-execution.md

Cell Tower Anomaly Detection using Serverless Spark through Google Cloud Composer

1. Understanding the data

2. Solution Diagram

3. Uploading DAG files to DAGs folder

4. Execution of Airflow DAG

5. BQ output tables

6. Logging

6.1 Airflow logging

6.2 Serverless Batch logs

6.3 Persistent History Server logs

Files

06c-cell-tower-anomaly-airflow-execution.md

Latest commit

History

06c-cell-tower-anomaly-airflow-execution.md

File metadata and controls

Cell Tower Anomaly Detection using Serverless Spark through Google Cloud Composer

1. Understanding the data

2. Solution Diagram

3. Uploading DAG files to DAGs folder

4. Execution of Airflow DAG

5. BQ output tables

6. Logging

6.1 Airflow logging

6.2 Serverless Batch logs

6.3 Persistent History Server logs