Neo4j, Cassandra, Hadoop, PySpark, RDD, MapReduce, Cluster-Computing, DataProc
-
Updated
May 1, 2023 - Python
Neo4j, Cassandra, Hadoop, PySpark, RDD, MapReduce, Cluster-Computing, DataProc
Support code for the article "Connecting GCP Dataproc and Elasticsearch: Bridging the Worlds of Big Data and (vector) Search"
Working examples for some components on GCP, and instructions on how to run them.
Repositório para armazenar artefatos de um trabalho da disciplina de Computação Distribuída.
Big data analysis of 'shared-world' cloud application.
Repositorio para realizar el curso en Udemy llamado "Airflow2.0 De 0 a Héroe", de la academia "Datapath".
Google DataProc Spark Scala Job for MNIST Handwritten Digit Recognition using Decision Trees (Spark MLlib)
This repository contains application code for the Wizeline Data Engineering Bootcamp (DEB) 2023. It is one of two repositories for the DEB. The other houses the infrastructure code.
A Pyspark project that performs ETL on a Dataproc cluster and writes data to Google Cloud Storage/BigQuery.
Cloud application to promote responsible tourism and help prevent overtourism.
Coding book counter words with PySpark for Digital Innovation One challenge
A shorted example to use Bayes Classification & Quantization in Pyspark using GCP Dataproc!
Add a description, image, and links to the dataproc topic page so that developers can more easily learn about it.
To associate your repository with the dataproc topic, visit your repo's landing page and select "manage topics."