Skip to content

Common patterns when writing Apache Spark programs in Scala to execute on Google Cloud Dataproc

Notifications You must be signed in to change notification settings

prabhaarya/gcp-dataproc-spark-scala

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scala Spark patterns on Google Cloud Dataproc

This repository contains patterns for the following common tasks on GCP:

  • Google Cloud Storage read / write
  • Google BigQuery read / write
  • Google Cloud Spanner read / write

Set up

Need to match dev environment with environment created by dataproc image

attribute Dataproc Local Dev
Dataproc image 2.2-debian12 n/a
Apache Spark 3.5.0 n/a
BigQuery connector 0.34.0 n/a
GCS connector 3.0.0 n/a
Java 11 zulu-11 (java version "11.0.20")
Scala 2.12.18 2.12.18
IDE n/a IntelliJ IDEA (2022.3.3)
build system n/a sbt
sbt n/a 1.9.9

About

Common patterns when writing Apache Spark programs in Scala to execute on Google Cloud Dataproc

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 100.0%