Skip to content
/ PolnPy Public

Game of Code Project #2018 #Luxembourg #Hackathon

Notifications You must be signed in to change notification settings

bsisic/PolnPy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PolnPy

We will try to update this README file from time to time during the GameOfCode hackathon

The idea: Get pollen historical data in Luxembourg and make pollen concentration predictions

SWATEC (Scrape, Wrangle, Analyze, Train and then, Expose, Consume)

Scrape

What:

  • Scrape data as neither API nor data set available

Data sources:

  • pollen.lu, 26+ years of data (since Jan 1, 1992)
  • wunderground.com, almost 22 years of data (since Jul 1, 1996)

Libraries used:

scrapy crawl pol_arch -o pol.csv

scrapy crawl wus -o wu.csv

Wrangle

What:

  • Cleanse
  • Sort
  • Join

Libraries used

Move the files from previous step in this directory and...

python sort_bydate.py -in pol.csv -out pol_sorted.csv

python sort_bydate.py -in wu.csv -out wu_sorted.csv

python join_plus_pol_wu -pol pol_sorted.csv -wu wu_sorted.csv -out pol_wu.csv

Analyze

What:

  • Discover and visualize data
  • Try to identify some correlations
  • Conclusion, it is very much a time series case (good candidate for Random Forest Regressor, FB Prophet, LSTM)

Libraries used:

jupyter notebook PolnPyAnalysis.ipynb and run all cells

Train

What:

Train and test a couple of models

  • Random Forest Regressor, pretty interesting result!
  • Prophet, good result but the range between lower estimate and higher estimate is quite large
  • LSTM, best result, our production is running on that model

Libraries used:

jupyter notebook PolnPyRandomForest.ipynb and run through all steps

[Dirty solution to be refactored...]

The same day forecast (weather_today.csv), the model (RFR_model.sav) and the consume_model.py script was tested in the backend

jupyter notebook PolnPyProphet.ipynb and run through all steps to see the results

jupyter notebook PolnPyLSTM.ipynb and run through all steps

[Dirty solution to be refactored...]

The same day forecast (weather_today_for_LSTN.csv), the model (LSTM_model.h5), the model json (model.json) and the consume_LSTM_model.py script will be used by the backend

Expose

What:

2 Restful API endpoints:

  • One to get historical data (all pollen types since 1996)
  • One to make predictions for same day and next day (only ambrosia, betula and graminea)
  • And just a small helper to get the list of supported pollens

Docker

Symphony

Redis

MongoDB

Consume

What:

Front end

...

About

Game of Code Project #2018 #Luxembourg #Hackathon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published