This is a hands-on tutorial that introduces comprehensive Exploratory Data Analysis (EDA) techniques to have better understandings about your data before doing serious tasks such as machine-learning or deep-learning.
- Student who wants to be a data scientist
- Junior data scientist
- Machine-learning researcher
- Some experiences with
Python
Pandas
Matplotlib
Jupyter Notebook
(or similar)
- GitHub & Google accounts
Fork
this repo then go to: https://colab.research.google.com/github/{your_github_id}/pydata2021-eda/
- Introduction
- Data loading and preprocessing
- Loading a csv file
- Merging many csv files
- Essential check: #Samples, Column Names, Unique Values, Missing Values, etc.
sidetable
- Preprocessing & Feture Engineering
- Handling missing values
- Extracting features
- Statistical Visualizations
matplotlib
: basic building block, essential for fine-tuningpandas
: data manipulation + plottingseaborn
: handymatplotlib
wrapper for statistical visualizations
- (Easy Enough) Interactive Visualizations
ipywidgets
plot.ly
andplot.ly
expressbokeh
altair
- Automatic EDA Report
dtale
pandas-profiling
sweetviz
autoviz
- Wrap-up and Some Tips
Sin-seok SEO @Safran Tech, Safran SA