Data transformation simplified for any Data platform.
Features:
The package has complete ETL process -
- Uses metadata, transformation & data model information to design ETL pipeline
- Builds target transformation SparkSQL and Spark Dataframes
- Builds source & target Hive DDLs
- Validates DataFrames, extends core classes, defines DataFrame transformations, and provides UDF SQL functions.
- Supports below fundamental transformations for ETL pipeline -
- Filters on source & target dataframes
- Grouping and Aggregations on source & target dataframes
- Heavily nested queries / dataframes
- Has complex and heavily nested XML, JSON, Parquet & ORC parser to nth level of nesting
- Has Unit test cases designed on function/method level & measures source code coverage
- Has information about delpoying to higher environments
- Has API documentation for customization & enhancement
Enhancements:
In progress -
- Integrate Audit and logging - Define Error codes, log process failures, Audit progress & runtime information