The project explores ways to predict party affiliation by text segments. Machine Learning and Deep Learning approaches are tested. This is the result of a two-week project from the Le Wagon Data Science Bootcamp, Batch 606 Berlin.
A demo is available at the following URL http://bundesterminator.herokuapp.com/.
For training the models the meeting minutes of the German Parliament was used. They are available as XML files from the open data website of the German Parliament. The XML files were pre-processed and translated into CSV files (currently the python framework pandas has no XML import).
The trained model can be exposed by a web API. It uses a lean setting based on FastAPI and Uvicorn. The deployment settings assume a deployment on Heroku.
The bundestag
folder represents the bundestag
python package.
It contains the main files for training the models.
Pipeline for a machine learning approach.
Class to wrap functionalities to train a Deep Learning model with Tensorflow Keras and a trained Gensim word2vev model.
Light wrapper to the Gensim w2v module.
Helper function to aquire the data.
Helper function to pre-process the data.
Other files are added to enable deployment of the API to Heroku and to have an automated workflow based on GitHub Actions.
Please note that you need to set environment variables to deploy on
Google Cloud Platform. This needs to be done directly in data.py
,
trainer.py
and bundestrainer.py
. For the MAKEFILE
environment
variables need to be set. This will replaced by a more flexible
approach in the future.
The work is a colloborative effort of the following team members who each contributed to the project:
We can not thank enough the AMAAAAAZIIIING team of Le Wagon. The patience, expertise, and dedication opened a new world for us.