Skip to content

Determine: Which variables are significant in predicting the price of a house, and How well those variables describe the price of a house.

Notifications You must be signed in to change notification settings

rushhemant/Surprise-Housing-Co---Linear-Regression

Repository files navigation

Surprise-Housing-Co---Linear-Regression

Determine: Which variables are significant in predicting the price of a house, and How well those variables describe the price of a house.

Read Data

  • Import important libraries
  • Read housing data into dataframe
  • Quick review of dataframe

Data Preparation

  • Check missing values in dataframe
  • Drop columns withh more than 80% missing values
  • Impute LotFrontage
  • Impute FireplaceQu
  • Impute Garage related fields
  • Impute Basement related fields
  • Impute missing categorical variables with mode
  • Impute missing quantitative variables with median

Feature Engineering

  • Calculate age of house when sold

Data Analysis

  • Check distribution of target variable
  • Transform targert variable (log transformation)
  • Create list of numeric and non-numeric columns
  • Analyze outliers from quantitative variables
  • Remove outliers from numerical data
  • Bar plots of quantitative variables vs SalePrice
  • Analyze impact of categorical values on price of house
  • Checking correlation of quantitative variables in housing
  • Pairplots for numerical variables to understand linear relationship

Data Preparation for Modeling

  • Dummy variable encoding (one-hot) for other categorical variables
  • Splitting the Data into Training and Testing Sets
  • Create X and y sets
  • Scaling the variables using StandardScaler (Normalizing)

Ridge Regression

  • Tune hyperparameter using GridSearchCV
  • Plotting scores to determine optimal alpha
  • Build Ridge regression using best alpha
  • Prediction using ridge regression

Lasso Regression

  • Tune hyperparameter using GridSearchCV
  • Plotting scores to determine optimal alpha
  • Build Lasso regression model using best alpha
  • Prediction using lasso regression

Model Conclusion

  • We will use lasso for final model prediction since:

  • The scores are higher and consistent

  • Model is simpler than ridge (less number of variables)

  • Final score of model:

  • Lasso regression train r2: 0.9281

  • Lasso regression test r2: 0.9122

About

Determine: Which variables are significant in predicting the price of a house, and How well those variables describe the price of a house.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published