This work is a personal study on Seattle's housing market and it is not for commercial use. Unfortunately, the data is limited to 2014-2015 and extension of the work to today's [crazy] market is not practical. Data exploration and ML modeling is captured in multiple "jupyter notebooks" in this repository. The problem is initially a regression modeling exercise. Simple linear model and tree-based algorithms was used. Although Random Forest (like always) offered a significant improvement to the model, for the purpose of statistical inference, improved (Ridge) linear model was chosen for this study.
Acknowledgement: The original dataset belongs to Kaggle: House Sales in King County, USA provided to redict house price using regression.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
data = pd.read_csv("clean_data.csv")
data.drop(['Unnamed: 0'], axis=1, inplace=True)
data.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
price | bedrooms | bathrooms | sqft_living | sqft_lot | floors | waterfront | view | condition | grade | ... | sqft_basement | yr_built | yr_renovated | zipcode | lat | long | sqft_living15 | sqft_lot15 | year | month | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 12.309982 | 3 | 1.00 | 1180 | 8.639411 | 1.0 | 0 | 0 | 3 | 7 | ... | 0 | 1955 | 0 | 98178 | 47.5112 | -122.257 | 1340 | 8.639411 | 2014 | 10 |
1 | 13.195614 | 3 | 2.25 | 2570 | 8.887653 | 2.0 | 0 | 0 | 3 | 7 | ... | 400 | 1951 | 1 | 98125 | 47.7210 | -122.319 | 1690 | 8.941022 | 2014 | 12 |
2 | 12.100712 | 2 | 1.00 | 770 | 9.210340 | 1.0 | 0 | 0 | 3 | 6 | ... | 0 | 1933 | 0 | 98028 | 47.7379 | -122.233 | 2720 | 8.994917 | 2015 | 2 |
3 | 13.311329 | 4 | 3.00 | 1960 | 8.517193 | 1.0 | 0 | 0 | 5 | 7 | ... | 910 | 1965 | 0 | 98136 | 47.5208 | -122.393 | 1360 | 8.517193 | 2014 | 12 |
4 | 13.142166 | 3 | 2.00 | 1680 | 8.997147 | 1.0 | 0 | 0 | 3 | 8 | ... | 0 | 1987 | 0 | 98074 | 47.6168 | -122.045 | 1800 | 8.923058 | 2015 | 2 |
5 rows × 21 columns
import plotly
import plotly.graph_objs as go
plotly.tools.set_credentials_file(username='----', api_key='----')
mapbox_access_token = '-----'
data_ = [
go.Scattermapbox(
lat=list(data.lat),
lon=list(data.long),
mode='markers',
marker=dict(size=2),
text=[''],
)
]
layout = go.Layout(
autosize=True,
hovermode='closest',
mapbox=dict(
accesstoken=mapbox_access_token,
bearing=0,
center=dict(
lat=47.5,
lon=-122
),
pitch=0,
zoom=5
),
)
fig = dict(data=data_, layout=layout);
plotly.plotly.iplot(fig)