Skip to content
/ amesml Public

A demonstration of data analysis and machine learning techniques for regression problems on tabular data, using the Ames Housing Dataset.

License

Notifications You must be signed in to change notification settings

fconcas/amesml

Repository files navigation

Ames Data Analysis and Machine Learning

A demonstration of data analysis and machine learning techniques for regression problems on tabular data, using the Ames Housing Dataset.

Project Structure

.
├───config      # Configuration files
├───data        # Data for training the model
├───model       # Saved model
├───notebooks   # Notebooks for analysis
├───references  # Data documentation
├───src         # Source files
├───static      # Flask static directory
└───templates   # Flask templates directory

How It Works

  • The root directory contains two main scripts: train_model.py and app.py, plus two analysis notebooks in the notebooks subdirectory.
  • The notebooks are used to analyse the data and prototype machine learning.
  • The train_model.py script trains a LightGBM model and saves it in the model subdirectory. If the data is not present in the data directory, it is automatically downloaded.
  • The app.py scripts runs a Flask application that generates a web interface for predicting house prices based on the model generated by train_model.py.

Configuring the Environment

You have three different options to run this software: using your local environment, using a Conda virtual environment, or using Docker.

Configuring the local environment

Download and install Python if not already installed. I recommend the version 3.12 (latest revision). Then open a terminal window, navigate to this project directory, and run the command:

pip install -r requirements.txt

To make sure that your pip environment contains the required libraries, or compatible versions. You're now ready to run this software.

Optional: if you wish to run also the notebooks, you need to install additional requirements:

pip install -r requirements-notebooks.txt

Once installed, you can run Jupyter with the command jupyter-lab and run the notebooks using its interface.

Configuring a Conda environment

Download and install Conda if not already present in your system. I recommend Miniconda. During the installation, make sure to deselect the option to add Anaconda to PATH and deselect the option to register it as the default Python to avoid systemwise problems.

Open the Anaconda prompt and create a new environment (in this example we call it amesml, but you can name it however you like):

conda create -n amesml python=3.12

Answer y when prompted for confirmation. After completing the creation, activate the newly created environment:

conda activate amesml

Navigate to this project directory, then install the requirements:

pip install -r requirements.txt

You're now ready to run this software.

Optional: if you wish to run also the notebooks, you need to install additional requirements:

pip install -r requirements-notebooks.txt

Once installed, you can run Jupyter with the command jupyter-lab and run the notebooks using its interface.

Packaging with Docker

The application can be built as a Docker image (in this example we call it amesml, but you can name it however you like):

docker build -t amesml .

Building the image for the first time will take several minutes.

Usage Instructions

The app.py script runs a server on the address 127.0.0.1:5000 (localhost:5000). While running, it can be accessed using a browser to connect to the address. The server generates a web interface for the machine learning model.

Local environment or conda

Open a terminal (or Anaconda Prompt if using Conda) and navigate to this project directory.

Training the model

In case the model file ames_regressor.pickle is missing from the model directory, you can generate it invoking the train_model.py script:

python train_model.py

It will take a few or several seconds, depending on how powerful your machine is.

Running the application

To run the main application, use the command:

python app.py

Docker

Once the image is built, it should be run by publishing the container's port 5000 to 127.0.0.1:5000 (localhost:5000) on the host:

docker run -p 127.0.0.1:5000:5000 amesml:latest

If you want to run the service in the background, detatching it from the terminal, you can use the additional argument -d:

docker run -dp 127.0.0.1:5000:5000 amesml:latest

Cleanup

In case you use Conda or Docker, here you can find instructions to cleanup the environments.

Conda cleanup

Open Anaconda Prompt if not already open. To remove the environment, deactivate it first if it's active:

conda deactivate

You can now remove the environment with the command:

conda env remove -n amesml

Answer y when prompted for confirmation.

Docker cleanup

To remove the packages and images, use Docker's interface. To empty the cache, use the command:

docker builder prune

Answer y when prompted for confirmation.

About

A demonstration of data analysis and machine learning techniques for regression problems on tabular data, using the Ames Housing Dataset.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages