Skip to content

This repo host the materials for the joblogic-x data science class 2017

Notifications You must be signed in to change notification settings

Tian-Su/intro_to_data_science_2017

Repository files navigation

Congratulations Class 2017!!!

github concepts

left to right: Yingxi Zhao, Wei Lu, Tian Su, Fangjing Xu, Xiongxing Li, Xiaoxuan Guo, Tengran Liu, Xuemin Zhang

intro_to_data_science_2017

This repo host the materials for the joblogic-x data science class 2017

Folder introduction

Data folder hosts practice datasets.

Class01-03 are the preparation classes. They contain the basic knowledge of the github, python and python packages such as pandas.

Bootcamp_day1-day4 are the four day data science bootcamp. Contents include data wrangling, modeling, and object oriented programming using python.

Major modeling steps

Frame the question based on business need

Data collection

Data & dictionary

Data cleaning & exploration

split data

missing imputation

basice feature selection

dummification

data exploration

Feature engineer

Modeling

fit baseline model use default hyper-parameters

initial model evaluation (validation set)

Co-optimize:

  • hyper-parameter tuning
  • feature selection

Visualization of optimization

Apply the model result to originial business question

About

This repo host the materials for the joblogic-x data science class 2017

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published