Table of contents Welcome to Sparkitecture! Cloud Service Integration Azure Storage Azure SQL Data Warehouse / Synapse Azure Data Factory Data Preparation Reading and Writing Data Shaping Data with Pipelines Other Common Tasks Machine Learning About Spark MLlib Classification Logistic Regression Naïve Bayes Decision Tree Random Forest Gradient-Boosted Trees Regression Linear Regression Decision Tree Random Forest Gradient-Boosted Trees MLflow Feature Importance Model Saving and Loading Model Evaluation Streaming Data Structured Streaming Operationalization API Serving Batch Scoring Natural Language Processing Text Data Preparation Model Evaluation Bioinformatics and Genomics Glow