Skip to content

Latest commit

 

History

History
64 lines (47 loc) · 16.6 KB

README.md

File metadata and controls

64 lines (47 loc) · 16.6 KB

big_data

Big Data for beginners

Explore a variety of tutorials and interactive demonstrations focused on Big Data technologies like Hadoop, Spark, and more, primarily presented in the format of Jupyter notebooks. Most notebooks are self-contained, with instructions for installing all required services. They can be run on Google Colab or in a virtual Ubuntu machine/container.

Setting Up Hadoop: Single-Node Configuration

Running Apache Spark in Standalone Mode

MapReduce Tutorials

PySpark Tutorials

Miscellaneous Tutorials

Virtualization and Cloud Automation

Big Data Learning Pathways

About this repository

Notebooks Testing and CI

Most executable Jupyter notebooks are tested on an Ubuntu virtual machine through a GitHub automated workflow. The log file for successful executions is named: action_log.txt (see also: Google Colab vs. GitHub Ubuntu Runner Open In Colab Render in nbviewer).

Current status:

  • Run Notebooks on Ubuntu
  • Run One Notebook on Ubuntu

The Github workflow is a starting point for what is known as Continuous Integration (CI) in DevOps/Platform Engineering circles.