Skip to content

08 Chapter: Inferential Statistics

Mikiko Bazeley edited this page Dec 19, 2019 · 1 revision

Inferential Statistics

Overview

This is the second unit in the Exploratory Data Analysis theme, and it focuses on the basics of statistical inference, hypothesis testing, regression, correlation, and their applications, such as A/B testing. Descriptive statistics is useful for discovering and communicating insights from data, while inferential statistics is useful for drawing conclusions and predicting outcomes. Both are a part of exploratory data analysis and used to understand data stories.

This is a large unit which is broken into many parts. First, we’ll brush up on the basics of probability and descriptive statistics. Then, we’ll learn to apply them using Python. You’ll practice these skills through a series of mini-projects, before applying them to your capstone project. Review the “What will help” section of the Unit Plan for more details, and reach out to your mentor and course TA if you struggle with the material.

Work to Submit:

Mini-project on human body temperature dataset

Mini-project on racial discrimination dataset

Mini-project on hospital readmissions dataset

Report on the inferential statistics methods used on the capstone project and its results

Unit Plan : Inferential Statistics

What You’ll Learn: Learning Objectives

  • Understand the fundamentals of statistical inference and hypothesis testing
  • Perform hypothesis testing for numeric and categorical data to identify statistical significance
  • Understand how hypothesis testing is applied in real-world applications such as A/B testing

Words to Know: Key Terms & Concepts

  • Hypothesis: An assumption made about the world that can be tested using the data
  • Statistical Inference: A branch of statistics dedicated to drawing conclusions about the world using smaller data samples.
  • Confidence Intervals: An interval estimate used to express the degree of uncertainty associated with a sample statistic.
  • Statistical Significance: An estimate of how likely an event might occur randomly - the smaller the number, the more likely that the observed event has some kind of real-world importance.

What will Help

  • Don't worry if some of the material, especially the math, feels a bit difficult. As long as you have a good intuitive understanding of the algorithms, you'll be OK!
  • There are several mini-projects in this unit that will support your learning.1
  • For a refresher on basic probability and descriptive statistics, use the Khan Academy Probability track
  • Keep your Capstone Project 1’s cleaned and wrangled data ready for use in this unit

8.1 - Foundations of Statistical Inference

Foundations of Statistical Inference The following resources from Khan Academy give you a solid foundation in statistical inference, which can be a bit dry, so please learn the material at your own pace and reach out to your TA with any questions. You may be tested on some of these concepts in your technical job interviews, so take the time to understand them thoroughly.

(see Khan Academy)


8.2 - Inferential Statistics Using Python

Inferential Statistics Using Python In the previous section, you learned the mathematical theories and concepts that form the foundation of inferential statistics. In this section, you’ll learn how to apply them using Python.

1 Interactive Exercises: Statistical Thinking in Python (Part 1) Open exercises
Students typically spend 3 - 5 Hours
https://www.datacamp.com/courses/statistical-thinking-in-python-part-1

After acquiring and forming data, you ultimately want to make clear, succinct conclusions. This step of a data analysis pipeline hinges on the principles of statistical inference. In this DataCamp resource, you’ll build the foundation you need to think statistically, speak the language of your data, and understand what the data are telling you. The foundations of statistical thinking took decades to build, but they can be grasped much faster today with the help of computers. With the power of Python-based tools, you’ll rapidly get up to speed and begin thinking statistically by the end of this course.

2 Interactive Exercises: Statistical Thinking in Python (Part 2) Open exercises
Students typically spend 4 - 6 Hours
https://www.datacamp.com/courses/statistical-thinking-in-python-part-2

Now that you have the probabilistic mindset and foundational hacker stats skills to dive into datasets and extract useful information, you’ll use this DataCamp resource to perform two key tasks in statistical inference, parameter estimation, and hypothesis testing. You’ll work with real datasets to analyze the beak measurements of Darwin's famous finches.


8.3 - Exploratory Data Analysis Projects

Exploratory Data Analysis Projects Now that you have a foundation in inferential statistics and hypothesis testing, it's time to see those ideas in action. The following mini-projects walk you through how hypothesis testing can be used to elicit insights from data and create good data stories. You’ll use concepts that you learned from both the Exploratory Data Analysis units (Units 7 and 8).

The three mini-projects in this unit have similar structure, but require different techniques. You may need to perform some cleaning and wrangling of the data and then perform some visual analysis, along with some statistical tests to answer the posed problem. Finally, you’ll put everything together into a coherent story summarizing your approach and conclusion. At the end of these mini-projects, you should have a clearer idea of how various visualization and statistical techniques can work together to create a data story.

Submit your results using the links below and discuss them with your mentor on the next call. Remember, if you’re feeling stuck, you can always reach out to your course TA for feedback on technical questions and code reviews.

Resources useful for me in solving the mini-projects:

Human Temperature EDA:

General:

8.4 - A/B Testing

A/B Testing A/B testing is a form of hypothesis testing, a randomized experiment with two variants, that has recently gained prominence for web and mobile design.

Required Readings:

8.5 - Apply Inferential Statistics

8.6 - Wrap-Up Inferential Statistics

Clone this wiki locally