A curated list of awesome Apache Spark packages and resources.
-
Updated
Oct 24, 2024 - Shell
A curated list of awesome Apache Spark packages and resources.
Self-service modeling analysis tool based on R language and big data. It integrates SparkR, Rserve, and Mlib machine learning libraries
Mirror of https://gitlab.com/zero323/dlt
Azure Databricks - Advent of 2020 Blogposts
This is a demonstration of using Spark to explore large dataset, by using PySpark and SparkR. The files include loading data, data exploration and using clustering on words of Shakespeare's novels.
This repository you are browsing contains intermediate level piece of codes which are useful for cleaning, exploratory analysis, handling of missing data points, outlier detection and different visualization techniques using graphics, ggplot2, tidycharts, ggExtra packages. Also in particular part of the script you can get basic information about…
Analyzing the safety (311) dataset published by Azure Open Datasets for Chicago, Boston and New York City using SparkR, SParkSQL, Azure Databricks, visualization using ggplot2 and leaflet. Focus is on descriptive analytics, visualization, clustering, time series forecasting and anomaly detection.
Taller Big Data con Apache Spark + R desde Databricks cloud
Bi and Big Data Analytics, sparkR, Supervised and Unsupervised Machine Learning techniques The project's aim is of applying a supervised and an unsupervised machine learning technique on a dataset to test different models/scenario, interpret the results, perform predictions for each model and visualised the results.
Fit a Cubist regression model on StackOverflow data and make predictions in a distributed manner with SparkR
R workloads running at scale on Google Cloud
Docker images for testing SparkR builds
Add a description, image, and links to the sparkr topic page so that developers can more easily learn about it.
To associate your repository with the sparkr topic, visit your repo's landing page and select "manage topics."