RealValue

About

RealValue is a machine learning project for predicting home prices in Toronto. Using TensorFlow convolutional neural networks in conjunction with a dense network component, owners can take a couple of pictures of their home, enter a few simple details and they will be provided an accurate price range of what their home is worth. This ease of use allows homeowners to be confident about their residential decisions and be more informed about the real estate market than ever before.

For more details about our project, please take a look at our Medium article here.

Also, check out our website at real-value.ca to try out some of our algorithms!

Motivation and Goal

In the field of real estate, the idea of predicting the "right price" for a property is growing heavily in interest. Most current algorithms solely use statistical information about given properties as a form of input to predict its right price. However, these algorithms fail to include a notable form of data that often influences the perception of a buyer: visual data of the house. Recently, convolutional neural networks (CNNs) have increased in prominence for their ability to generate strong feature representations out of images and use those representations to accurately map visual inputs to scalar/vectorized outputs.

Our goal was to create a custom convolutional neural network to accurately predict Toronto housing prices with less than 20% error.

Features and Overview

Combined CNN and dense network model
- Easy to swap CNN model architectures
- Easy to change dense network size
Transfer learning using California and Toronto housing datasets
- California Dataset
- Also modified dataset to use latitude/longitude values in place of postal codes
Custom Toronto Dataset we collected in February 2021 (157 houses)
Image data augmentation (crop, rotate, mirroring, saturation, brightness)
- Inputs to Network:
- 2x2 Mosaic image (bedroom, bathroom, kitchen, frontal view)
- Price, Number of bedrooms, bathrooms, square feet, and postal code
Configurable training (hyperparameters, model architecture) with config.yaml

Installation and Quick Setup

To download our code: git clone “url”

Dependent Packages: tensorflow 2.3.0, matplotlib, opencv-python, numpy, pandas, keras, sklearn

To install the dependent packages, run:

pip install -r requirements.txt

Dataset

California Dataset

We initially trained our network on a dataset of California houses, created by Ahmed and Moustafa, consisting of both structured data (statistical property information in tabular form) and unstructured data (images). This dataset contains information for over 500 houses, each with 4 images of a bathroom, bedroom, kitchen, and frontal view. Statistical information for each house includes the number of bedrooms, bathrooms, square footage, postal code, and price.

Toronto Dataset

We created our own Toronto real estate dataset by compiling the images, prices, number of rooms, surface areas, and postal codes of houses on the Toronto Regional Real Estate Board (TRREB) website. A Python script was used to accurately calculate the area of the house from provided measurements of each room. Like the California dataset, in our Toronto dataset, we had four images for each house for the frontal view, bedroom, bathroom, and kitchen.

Advanced Configuration

Training the model with `config.yaml` and `models/`

Our model’s hyperparameters are stored in a config.yaml file. To start training, modify the config.yaml if needed and issue the following command

python pipeline.py

Import mode `True` vs `False`

Since data augmentation can take considerable time, we can set the import_mode in config.yaml to skip augmentation to start training immediately.

On the first run, set import_mode: False in config.yaml to perform data augmentation. On future runs, you can set import_mode: True to skip data augmentation and use previous augmented data. You can always use import_mode: False without issues; it just might be slower.

Note: If you switch/modify the dataset or augmentation multiplier, make sure to use import_mode: True for the first run.

Adding/choosing different model architectures and other hyperparameters

To change hyperparameters like learning rate, optimizer, etc change the parameters on the corresponding lines in the config.yaml

In particular, the CNN model and dense model layers are set by the following lines

# Train using RegNet as CNN and a 2 layer dense network (8 units in first layer, 4 units in second layer)
CNN_model: 'RegNet'
dense_model:
  - 8
  - 4

The number of dense layers and their size can be changed using config.yaml.

Changing the CNN network is more involved, but still straightforward. If you want to add your CustomNet, follow the instructions below. As a basic working example, check out how we defined LeNet as a CNN in models/CNN_models/lenet.py and then used it in get_network() in models/__init__.py. Define a function that returns your custom CNN as a tf.keras.Model in a new file at models/CNN_models/CustomNet.py Modify get_network() in models/__init__.py to call your new function with your custom CNN Change your config.yaml to have `CNN_model: 'CustomNet'

Initial training on California Dataset

To train on the California dataset, specify directory: 'raw_dataset' in the first line of the config.yaml file. The California dataset is located in the raw_dataset directory.

Transfer learning on Toronto Dataset

To apply transfer learning on the Toronto Dataset, specify directory: 'toronto_raw_dataset' in the first line of the config.yaml file. The Toronto dataset is located in the toronto_raw_dataset directory.

Remember to set import_mode: False in between switching datasets.

Results

We achieved a test error of 21% using a Zip Code approach on the California dataset, and a test error of 15% using a Latitude and Longitude approach.

The Zip Code accuracy is nearly 6% better compared to contemporary approaches such as https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/.

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
Group Output Files		Group Output Files
array_files		array_files
auxiliary_files		auxiliary_files
cali_toronto_raw_dataset		cali_toronto_raw_dataset
models		models
raw_dataset		raw_dataset
toronto_raw_dataset		toronto_raw_dataset
toronto_zip_raw_dataset		toronto_zip_raw_dataset
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
config_efficient_net.yaml		config_efficient_net.yaml
global_vars.py		global_vars.py
model_predict.py		model_predict.py
old_config.yaml		old_config.yaml
pipeline.py		pipeline.py
requirements.txt		requirements.txt
rotate_and_crop.py		rotate_and_crop.py
split_and_augment_dataset.py		split_and_augment_dataset.py
step1.py		step1.py
tester.py		tester.py
train_test_val_split_class.py		train_test_val_split_class.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RealValue

About

Motivation and Goal

Features and Overview

Installation and Quick Setup

Dataset

California Dataset

Toronto Dataset

Advanced Configuration

Training the model with `config.yaml` and `models/`

Import mode `True` vs `False`

Adding/choosing different model architectures and other hyperparameters

Initial training on California Dataset

Transfer learning on Toronto Dataset

Results

About

Releases

Packages

Contributors 4

Languages

mattleung10/RealValue-Local

Folders and files

Latest commit

History

Repository files navigation

RealValue

About

Motivation and Goal

Features and Overview

Installation and Quick Setup

Dataset

California Dataset

Toronto Dataset

Advanced Configuration

Training the model with config.yaml and models/

Import mode True vs False

Adding/choosing different model architectures and other hyperparameters

Initial training on California Dataset

Transfer learning on Toronto Dataset

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Training the model with `config.yaml` and `models/`

Import mode `True` vs `False`

Packages