This projects contains Databricks notebooks explaining about the various functions avaiable in Spark scala API in Spark 2.x. Tried to add little explanation before each API to understand the concept behind it and then actual code and how to use it with an explanation
S.No. | Topic | Contents |
---|---|---|
1 | Spark Session | This notebook contains the basic functions available with spark API like configurations, reading data and metadata functions. |
2 | Dataframe Vs Dataset | This notebook compares the two structured APIs i.e Dataframes & Datasets and try explain the difference between these two programatically. |
3 | Catalogue Functions | This notebook explains the standard API to access the metadata like temp table, registered udfs on SQL context or permanent metadata like Hive meta store or HCatalog. |
4 | Basic DataSet Functions | This notebook explains some of the basic methods available in Dataset API i.e schema , explain, view creations, etc., |
5 | Datasets -Typed Transformation Functions | This notebook explains some of the tranformation functions map, filter, flatmap, randomsplit, repartition, groupByKey, sample,etc |
6 | Basic SQL Functions | This notebook explains basic sql functions like select, filter, where, orderBy, sort, limit, NA , stat,etc., |
7 | Aggregate Functions | This notebook covers all the aggregate functions available in Spark i.e groupBy, window, pivot, rolup,cube. |
8 | Join Functions | This notebook covers all the join functions available in Spark i.e Inner,Outer, LeftOuter,RightOuter, LeftSemi, LeftAnti,Cross & Neutral Joins |
9 | Datasources | This notebook covers reading and writing data from/to various datasources i.e csv, json, orc,parquet,avro, hive table, sql table, xml files. |
10 | DatasetActions | This notebook covers all the available actions in Dataset |
11 | ColumnFunctions | This notebook covers some of the functions that works on columns in Dataframe/Dataset |