Skip to content

mhsmario/Final-Project---Applied-Data-Science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Applied-Data-Science--Spring-2018

Please note the structure of this repo as outlined in this document, as well, as key aspects of this project.

| -- Src 

	| -- LibraryInstaller.R

	| -- DefineFunctions.R 

| -- data 

	| -- raw 

	| -- processed
		      
  | -- References  

	| -- README.md

##Project outline

Group Member: Mario Saraiva, Lizhizi Cui

Start date: March 01, 2018

End date: May 10, 2018

This project is based on the Kaggle competition on "House Prices: Advanced Regression Techniques".

The data is available at: https://www.kaggle.com/c/house-prices-advanced-regression-techniques.

Outcomes:

  1. Executive report with findings, including but not limited to:
  • Different predictive models

  • The pros and cons of each model

  • Reflections and Recommendations

###Phase 0: Project Setup

  • Create Repo, folders, and outline tasks.

###Phase 1: Exploratory data analysis.

  • Understand how the data is distributed
  • Histograms
  • Scatter plots
  • Produce descriptive statistics / summaries
  • Extract important input variables for the analysis
  • Identify outliers
  • Identity patterns (if any).
  • Make a ranked list of important input variables for the analysis
  • Have a sense of robustness of conclusions(sample biased)
  • Conclusion as to whether individual factors are statistically significant
  • Uncertainties for important estimates
  • Define the problem/purpose of the project(assumptions)

###Phase 2: Design different models to be tested.

###Phase 3: Test models.

###Phase 4: Compare models

###Phase 5: Compile findings into a report.

About

Repo for a applied data science project/Spring 2018.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages