By: Kristof Vandewynckel - Junior Data Scientist at BeCode
This project followed upon an older webscraping project, found here. Where we used Selennium to scrape real estate information from the website ImmoWeb. The assignment given by ImmoEliza was to create a machine learning model to predict housing prices based on user input, and deploy this in an workable app through Heroku.
Code Used:
- Python
Libraries Used:
- Pandas
- Numpy
- Pickle
- Sklearn
- Flask
- Math
Deployment through Heroku and Git.
- model: - Our saved Machine Learning model to be used in our prediction.
- predict: - Our function to train our model and save it.
- preprocessing: - Our function to preprocess the scraped data and make it usable for our predict.
- static: - Contains the .CSS document used in our app.
- templates: - Contains the .html documents used in our app
Folder used: /Preprocessing
To begin we take the housing data we got from our webscraping project. We use our function preprocess() to clean up all the data, this includes; removing NaN, dealing with unreadable or incomplete data, simplifying values with the same meaning, changing strings to dummies (0/1) for better model comprehension,..
Once this is done everything is saved in a filtered .csv file to be used by our model.
Once we have our clean data we use our train() function from the prediction.py file in our predict folder, to train our model with the filtered data. Here we can change Machine Learning models for increasing accuracy in the future. Just run the train() function with the correct path to the .csv and it will save a Machine Learning model in our model folder with Pickle.
Folder used: /Predict
Try for yourself here. Once the model is trained and saved by pickle our predict() function (currently inside our app.py file) will take the input given from the user and apply our Machine Learning model to it. This will then give us an estimation on the price of our given house through Flask. We can navigate on the pages to Home or Predict, to query a new estimation.
This project can still be improved upon by more data, other Machine Learning models and a more extensive app. Feel free to play around with the estimates and create an even better model/app.