This repository is a template for a python data science project.
The root should contain at least the following elements:
project/
: a folder named after the project containing the code. Inside, you should follow the code organisation described belowREADME.md
: a markdown file describing very shortly the objective, methods, main results, and how to use the repodoc/
: folder containing all additional and detailed documentation of your repository..gitignore
: list of git ignored files
Optionally, you may find a sonar-project.properties
which is a configuration file for the sonarqube software, a code quality evaluator.
Inside project/
, the code should be organised according to Domain Driven Design (hereafter DDD):
infrastructure/
: data collection and technical cleaning, connector to the databasedomain/
: domain-related cleaning, feature engineering, machine learning algorithm, domain specific knowledgeapplication/
: application orchestrator or API, you may define here the order in which the cleaning processes must be runinterface/
: user interface (eg: javascript front-end). The template includes a dashboard template from ng2-admin
In addition, you will find two folders:
tests/
: unit (functions test), integration (pipeline test), acceptance (expected domain values test), scalability (workload test)utils/
: very generic functions
- Add the repo directory to your
PYTHONPATH
environment variable (eg: in your.bashrc
) - Change the name of
project/
directory, if necessary - Configuration parameters (eg: location of data, API port, database configuration, etc.) should be set into a .ini file inside
project/application/
. You may copyproject/template.ini
into amy_config.ini
which should not be committed. - In order to access the configuration parameters, you can use :
from project.utils.config import Config
config = Config("template.ini")
print(config["database"]["password"]) #return 'password_test'
warning: Config
returns a ConfigParser
object. Please see the configparser module documentation for drawbacks (eg: an integer passed into the configuration file will be recognized as a str object)
- Import a python module using :
import project.domain.sub_directory.my_file
You don't know how to start ? Check out the python examples scripts inside each folder, starting by application/main.py
. The goal of the example is to use a iris.csv
dataset to make a prediction on two new rows using a random forest regressor.
Support for python3.x only