Skip to content

datascienceid/data-science-learning-path

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Learning Path

Machine Learning

Chapter 01 - Pengenalan

  1. Apa itu machine learning, artificial intelligence, dan data science
  2. Apa saja masalah-masalah yang dapat diselesaikan menggunakan machine learning?
  3. Bidang-bidang yang terkait dengan machine learning
  4. Apa yang perlu dikuasai untuk menjadi seorang machine learning?

Chapter 02 - Regresi

  1. Pengenalan tentang regresi (termasuk evaluation metrics, e.g: MSE dan MAE)
  2. Regresi linear sederhana
  3. Regresi polinomial
  4. Regresi dengan regularisasi
  5. Suport vector regression
  6. Generalized linear model

Chapter 03 - Klasifikasi

  1. Pengenalan tentang klasifikasi dan confusion matrix
  2. Logistic regression (regresi logistik)
  3. LDA (Linear Discriminant Analysis)
  4. k-NN (k-Nearest Neighbors)
  5. Naive bayes
  6. Decision tree
  7. Support vector machine
  8. Neural networks

Chapter 04 - Klastering

  1. Pengenalan tentang klastering
  2. k-means klastering
  3. EM (Expectation-Maximization) klastering
  4. Klastering hirarkis

Chapter 05 - Metode Kernel

  1. Pengenalan tentang metode kernel
  2. Kernel k-means
  3. Kernel SVM
  4. Kernel regresi

Chapter 06 - Data Preprocessing

  1. Feature engineering
  2. Transformasi data
  3. Data cleaning
  4. Pengurangan dimensi (PCA, LDA)
  5. Seleksi variabel

Deep Learning

Chapter 01 - Pengenalan

  1. Pengenalan tentang deep learning dan tools

Chapter 02 - Deep Learning Model

  1. CNN (Convolutional Neural Networks): case untuk klasifikasi digit MNIST
  2. RNN (Recurrent Neural Networks)
  3. Generative Model: GAN (Generative Adversarial Networks) dan Autoencoder

Chapter 03 - State of the Art Model

  1. Deep learning Object Detection: SSD, Yolo, Mask RCNN
  2. Deep learning Image Segmentation: FCN, SegNet, Mask RCNN

Text Mining dan Natural Language Processing

Chapter 01 - Pengantar Text Mining dan NLP

  1. Overview Text Mining dan NLP
  2. Corpus
  3. Dictionary

Chapter 02 - Feature Extraction

  1. Feature extraction
  2. Bag of words
  3. Term Document matrix
  4. Term frequency and Weight
  5. TF-IDF

Chapter 03 - Grammar

  1. POS Tagging
  2. Named Entity Recognition

Chapter 04 - Text Classification

  1. Overview Text Classification
  2. Binary Classification
  3. Multiclass Classification
  4. Multilabel Classification

Chapter 05 - Deep NLP

  1. Information Retrieval
  2. Text Clustering
  3. Document Similarity
  4. topic modeling
  5. Word2Vec
  6. Skip.Gram
  7. CBOW
  8. Language Modeling
  9. Natural Language Understanding
  10. Natural Language Generation

Computer Vision

Chapter 01 - Pengenalan

  1. Pengenalan tentang computer vision dan tools
  2. Representasi image dan video di dalam komputer

Chapter 02 - Image Thresholding

  1. Binary thresholding
  2. Otsu thresholding

Chapter 03 - Spatial Filtering

  1. Pengenalan tentang spatial filtering
  2. Smoothing (averaging filter)
  3. Sharpening
  4. Median filter
  5. Sobel filter

Chapter 04 - Morphological Processing

  1. Erosion
  2. Dilation
  3. Morphological opening & closing

Chapter 05 - Image Analysis

  1. Connected component analysis
  2. Image segmentation
  3. Object detection: case face detection

Automatic Speech Recognition

Chapter 01 - Introduction

  1. Overview Speech Recognition

Chapter 02 - Signal Processing

  1. MFCC
  2. LPC
  3. Noise Reduction

Advance Speech Recognition

  1. Speech Recognition for Low Resource
  2. Large Vocabulary Continuous Speech Recognition
  3. Speaker Indentification
  4. Speech Enhancement
  5. Speech separation

Data Visualization

Chapter 01 - Pengenalan

  1. Overview Data Visualization
  2. Principles of Data Visualization
  3. Overview Chart

Chapter 02 - Charts

  1. Pie Chart
  2. Line Chart
  3. Bar Chart
  4. Stacked Bar Chart
  5. Heat Map
  6. Bubble Chart
  7. Area Charts
  8. Box Plot
  9. Whisker plot
  10. Scatter Plot
  11. GeoSpatial
  12. Real Time Data Visualization

Toolbox

Chapter 01 - Toolbox

  1. MS Excel with Analysis toolpack
  2. Java, Python
  3. R, Rstudio, Rattle
  4. Weka, Knime, RapidMiner
  5. Hadoop dist of choice
  6. Spark, Storm
  7. Flume, Scibe, Chukwa
  8. Nutch, Talend, Scraperwiki
  9. Webscraper, Flume, Sqoop
  10. tm, RWeka, NLTK
  11. RHIPE
  12. D3.js, ggplot2, Shiny
  13. IBM Languageware
  14. Microsoft Azure, AWS, Google Cloud
  15. Cassandra, MongoDB
  16. Microsoft Cognitive API
  17. Tensorflow
  18. Git

Database

  1. Pengenalan Basis Data
  2. Basic SQL
  3. Intermediate SQL
  4. Advance SQL

Building Blocks

Chapter 01 - Fundamentals

  1. Matrices, Vector & Algebra fundamentals
  2. Hash function, binary tree, O(n)
  3. Relational algebra, DB basics (with SQL)
  4. Inner, Outer, Cross, theta-join
  5. CAP theorem
  6. Tabular data
  7. Entropy
  8. Data frames & series
  9. Sharding
  10. OLAP
  11. Multidimensional Data model
  12. ETL
  13. Reporting vs BI vs Analytics
  14. JSON and XML
  15. NoSQL
  16. Regex

Chapter 02 - Statistics

  1. Pick a dataset
  2. Descriptive statistics
  3. Exploratory data analysis
  4. Histograms
  5. Percentiles & outliers
  6. Probability theory
  7. Bayes theorem
  8. Random variables
  9. Cumul Dist Fn (CDF)
  10. Continuous distributions
  11. Skewness
  12. ANOVA
  13. Prob Den Fn (PDF)
  14. Central Limit theorem
  15. Monte Carlo method
  16. Hypothesis Testing
  17. p-Value
  18. Chi2 test
  19. Estimation
  20. Confid Int (CI)
  21. MLE
  22. Kernel Density estimate
  23. Regression
  24. Covariance
  25. Correlation
  26. Pearson coeff
  27. Causation
  28. Least2-fit
  29. Euclidian Distance
  30. Measures of centralizing Data
  31. Measures of spread Data

Chapter 03 - Programming

  1. Python Basics
  2. Working in excel
  3. R setup / R studio
  4. R basics
  5. Expressions
  6. Variables
  7. IBM SPSS
  8. Rapid Miner
  9. Vectors
  10. Matrices
  11. Arrays
  12. Factors
  13. Lists
  14. Data frames
  15. Reading CSV data
  16. Reading raw data
  17. Subsetting data
  18. Manipulate data frames
  19. Functions
  20. Factor analysis
  21. Install PKGS
  22. Code versioning
  23. Data Table

Chapter 04 - Big Data

  1. Map Reduce fundamentals
  2. Hadoop Components
  3. HDFS
  4. Data replications Principles
  5. Setup Hadoop
  6. Name & data nodes
  7. Job & task tracker
  8. M/R programming
  9. Sqop: Loading data in HDFS
  10. Flume, Scribe
  11. SQL with Pig
  12. DWH with Hive
  13. Scribe, Chukwa for Weblog
  14. Using Mahout
  15. Zookeeper Avro
  16. Storm: Hadoop Realtime
  17. Rhadoop, RHIPE
  18. RMR
  19. Cassandra
  20. MongoDB, Neo4j

Chapter 05 - Data Munging

  1. Summary of data formats
  2. Data discovery
  3. Data sources & Acquisition
  4. Data integration
  5. Data fusion
  6. Transformation & enrichment
  7. Data survey
  8. Google OpenRefine
  9. How much data ?
  10. Using ETL
  11. Dim. and num. reduction
  12. Normalization
  13. Data scrubbing
  14. Handling missing Values
  15. Unbiased estimators
  16. Binning Sparse Values
  17. Feature extraction
  18. Denoising
  19. Sampling
  20. Stratified sampling
  21. PCA

Chapter 06 - Python

  1. Intro Python
  2. Set Up Environment
  3. Data Structure
  4. Iteration & Conditional
  5. Intro Libraries
  6. Function
  7. OOP
  8. Package
  9. Numpy

About

Data Science Learning Path

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published