Skip to content

Basics of Big Data and Machine Learning using Apache Spark and Scala

Notifications You must be signed in to change notification settings

anujdutt9/BigData-and-Machine-Learning

Repository files navigation

BigData-and-Machine-Learning

This repository contains Scala code with examples and projects using Apache Spark for Big Data and Machine Learning.

Repository Contents:

  1. Scala Basics

This folder contains the basic concepts for Introduction to Scala Programming like basic arithmetic, boolean types, strings and tuples.

  1. Scala Collections

This folder contains the basic concepts in Scala Programming like Arrays, Lists, Sets and Maps.

  1. Scala Programming

This folder contains the basic programming concepts in Scala like control flow, loops (for and while) and functions.

  1. Apache Spark Machine Learning using Scala

This folder contains the code demonstrating use of Apache Spark's Machine Learning libraray MLlib using Scala. Use of all Machine Learning

Algorithms supported by the Apache Spark framework is shown using Scala programming language. In addition to this, this folder also contains some basic projects using these skills.

The folders under this directory contain the code for:

a) Classification

b) Clustering

c) Model Validation

d) Regression

e) PCA

f) Recommender System

In addition to this, a project for all of the above said techniques is also included.

NOTE: All Dataset files used are included in the respective folders. Additional datasets user are in Datasets folder.

  1. Apache Spark DataFrames This folder contains the code for using Apache Spark's latest feature of DataFrames using Scala. This folder contains the code for basic operations performed on a DataFrame like deleting/removing null values and grouping values based on contents.

It also contains the code for two projects using Netflix dataset and Sales dataset.

Requirements:

  1. Apache Spark (Latest Version)
  2. Scala
  3. Text Editor like Atom/Notepad++ or IntelliJ IDE for writing the code.

NOTE: To test the installation, use command prompt and use the command:

spark-shell

If everything works well, then you should see a prompt as below:

>scala

Resources:

S.No. Name Links
1. Apache Spark Website (Latest Version) http://spark.apache.org/docs/latest/ml-guide.html
2. Scala https://www.scala-lang.org/
3. Atom Text Editor https://atom.io/

About

Basics of Big Data and Machine Learning using Apache Spark and Scala

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages