Skip to content

Data Wrangling, Exploration and Visualization

Rajdeep Biswas edited this page Jun 12, 2020 · 3 revisions

Data Wrangling and Enrichment

Data is cleansed and enriched using SparkR and SparkSQL. The curated dataset is written in Azure Blob storage in parquet format (parquet.apache.org, n.d.) partitioned by City Name. The Code can be referred from “Step02a_Data_Wrangling” R Notebook from the artifacts section.

Final Data Structure:

Final Data Structure

Sink Storage:

storage


Data Exploration and Visualization

Data exploration and visualization is done using SparkR, SparkSQL, ggplot2, htmltools, htmlwidgets, leaflet with ESRI plugin, magrittr etc. The Code and detailed exploration can be referred from “Step02b_Data_Exploration_Visualization” R Notebook from the artifacts section. Below are some highlights from this notebook.

Top 30 and bottom 30 safety incidents reported in Chicago:

top30_bottom30_chicago

Changes Over Time - Volume of All Safety Calls and specific Safety Calls (Graffiti in this example):

all_safety_graffiti

Fully explorable geoplot done using leaflet with ESRI plugin (with a subset of the data) attached in the artifacts :

geoplot

The interactive map with a subset of the data can be viewed here