AWS Setup Created S3 bucket Created a VPC Create a KeyPair Setted up the EMR Cluster Inbound/Outbound rules Connected to Master node via Command Line Used AWS CLI to upload data from a public URL into S3 Tools used Hive Spark Libaries used SparkR, ggplot