Skip to content

kks32-courses/data-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data analytics and machine learning with Spark

Krishna Kumar, Department of Engineering, University of Cambridge

https://creativecommons.org/licenses/by-nc-sa/4.0/ Build Status https://www.gitbook.com/read/book/kks32-courses/data-analytics

Course description

In this course, we will first cover the basics of using Spark for data analytics. Spark is rapidly becoming the compute engine of choice for big data. Spark programs are more concise and often run 10-100 times faster than Hadoop MapReduce jobs.

This course will teach you the basics of working with Spark (PySpark) and will provide you with the necessary foundation for diving deeper into Spark. You will learn about Spark’s architecture and programming model, including commonly used APIs. After completing this course, you will be able to write and debug basic Spark applications. The focus of this course will be Spark Core, Spark SQL, and Spark MLlib. This course will also cover real-time data streaming and processing as well as data visualisation techniques.

Prerequisites

  • Knowledge of Unix/Linux command line and SSH.
  • Python programming