pv-apache-beam

This project is to demostrate how to setup the data pipeline to proces multiple files through the Apache Beam on basic wordcount and Solardatatools. It demostrates how to setup and run the pipeline on your local machine and google cloud platform.

Installation and setup environment

Set up the python virtual environment and install python dependences. In the project folder, the the command below

python3 -m venv venv
source ./venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Create a .env file, and set up the google cloud platform cradential json file locations

export GOOGLE_APPLICATION_CREDENTIALS='your-cradential-location'

Examples

wordcount-example

Run example in your local machine

Under the wordcount-example folder, run the command below:

python -m wordcount --runner DirectRunner \
--input ./kinglear-1.txt \
--output ./results

Run example on GCP

solardatatools-onefile-example

Under the solardatatools-onefile-example folder, run the command below:

python -m main --runner DirectRunner \
--input ./data \
--output ./parallelism/results \
--direct_num_workers 0

solardatatools-custom-package-example

In this example, the solardatatools package is packaged as transformers. You have to install this custom package by the command. Under the solardatatools-custom-package-example folder, run the command below:

pip install ./transformers

Then you can execute the command

python -m main --runner DirectRunner \
--input ./kinglear-1.txt \
--output ./results

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
solardatatools-custom-package-example		solardatatools-custom-package-example
solardatatools-onefile-examples		solardatatools-onefile-examples
wordcount-example		wordcount-example
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pv-apache-beam

Installation and setup environment

Examples

wordcount-example

Run example in your local machine

Run example on GCP

solardatatools-onefile-example

solardatatools-custom-package-example

Run in local

reference

About

Releases

Packages

Languages

slacgismo/pv-apache-beam

Folders and files

Latest commit

History

Repository files navigation

pv-apache-beam

Installation and setup environment

Examples

wordcount-example

Run example in your local machine

Run example on GCP

solardatatools-onefile-example

solardatatools-custom-package-example

Run in local

reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages