Data Science Question Answer

The purpose of this repo is two fold:

To help you (data science practitioners) prepare for data science related interviews
To introduce to people who don't know but want to learn some basic data science concepts

The focus is on the knowledge breadth so this is more of a quick reference rather than an in-depth study material. If you want to learn a specific topic in detail please refer to other content or reach out and I'd love to point you to materials I found useful.

I might add some topics from time to time but hey, this should also be a community effort, right? Any pull request is welcome!

Here are the categorizes:

Resume
SQL
Tools and Framework
Statistics and ML In General
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Natural Language Processing
System

Resume

The only advice I can give about resume is to indicate your past data science / machine learning projects in a specific, quantifiable way. Consider the following two statements:

Trained a machine learning system

and

Designed and deployed a deep learning model to recognize objects using Keras, Tensorflow, and Node.js. The model has 1/30 model size, 1/3 training time, 1/5 inference time, and 2x faster convergence compared with traditional neural networks (e.g, ResNet)

The second is much better because it quantifies your contribution and also highlights specific technologies you used (and therefore have expertise in). This would require you to log what you've done during experiments. But don't exaggerate.

Spend some time going over your resume / past projects to make sure you explain them well.

SQL

Difference between joins

Difference between joins

(INNER) JOIN: Returns records that have matching values in both tables
LEFT (OUTER) JOIN: Return all records from the left table, and the matched records from the right table
RIGHT (OUTER) JOIN: Return all records from the right table, and the matched records from the left table
FULL (OUTER) JOIN: Return all records when there is a match in either left or right table

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md

License

DataScienceWorks/data-science-question-answer

Folders and files

Latest commit

History

Repository files navigation

Data Science Question Answer

Resume

SQL

Difference between joins

Tools and Framework

Spark

Statistics and ML In General

Project Workflow

Cross Validation

Feature Importance

Mean Squared Error vs. Mean Absolute Error

L1 vs L2 regularization

Correlation vs Covariance

Would adding more data address underfitting

Activation Function

Bagging

Stacking

Generative vs discriminative

Parametric vs Nonparametric

Recommender System

Supervised Learning

Linear regression

Logistic regression

Naive Bayes

KNN

SVM

Decision tree

Random forest

Boosting Tree

MLP

CNN

RNN and LSTM

Unsupervised Learning

Clustering

Principal Component Analysis

Autoencoder

Generative Adversarial Network

Reinforcement Learning

Natural Language Processing

Tokenization

Stemming and lemmatization

N gram

Bag of Words

word2vec

System

Cron job

Linux

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages