Vanguard Investment Project

Project Overview

This project aims to analyze client behavior and key performance metrics for Vanguard's online investment process. The goal is to determine if the new user interface (UI) leads to a higher completion rate compared to the traditional UI through A/B testing.

Hypotheses

1st Hypothesis: Completion Rate Analysis

Null Hypothesis (H0): There is no significant difference in completion rates between the Test and Control groups.
Alternative Hypothesis (H1): The completion rate is significantly higher in the Test group compared to the Control group.

2nd Hypothesis: Completion Rate with Cost-Effectiveness Threshold

Null Hypothesis (H0): The increase in completion rate is less than 5%.
Alternative Hypothesis (H1): The increase in completion rate is at least 5%.

3rd Hypothesis: Interaction Patterns

Null Hypothesis (H0): There is no difference in the number of actions taken between the Test and Control groups.
Alternative Hypothesis (H1): There is a significant difference in the number of actions taken between the Test and Control groups.

Data Sources

df_final_demo.csv: Client demographic data.
df_final_web_data_pt_1.csv: Web interaction data for the first period.
df_final_web_data_pt_2.csv: Web interaction data for the second period.
df_final_experiment_clients.csv: Information on clients involved in the experiment.

Data Preparation and Cleaning

Loading Data: Load and inspect datasets.
Univariate and Bivariate Analysis: Perform initial analysis to understand variables relation, and outlier detection.
Cleaning Data: Handle missing values, remove duplicates, and address outliers.
Merging Datasets: Merge datasets to create a comprehensive dataset for analysis.

Exploratory Data Analysis (EDA)

Demographic Analysis: Analyze client demographics (age, gender, tenure, balance).
Behavior Analysis: Study client behavior based on web interaction data.
High-Value Clients: Identified primary clients based on the A/B testing, and before cleaning identified high value clients (with outliers).

Visualizations

1. Proof of Concept (PoC) Diagram

High-level overview of the project methodology, from data collection to analysis and conclusions.

2. Completion Rates (1st Hypothesis)

Comparison of completion rates between the Test and Control groups.

3. Completion Rate with Cost-Effectiveness Threshold (2nd Hypothesis)

Completion rate based on a Cost-effectiveness treshold.

4. Normalization Detection (3rd Hypothesis)

Histogram and Q-Qplot after removing outliers for better normalization. Shapiro-Wilk test of 0.94 and 0.95 (Normal Distribution)

5. Interaction Pattern (3rd Hypothesis)

Bar plot showing average actions required to complete the process per group

6. Duration Acessment

Line graph showing normalization after 4 months, showing that duration of the experiment was enough

6. Power Analysis on Completion Rate

Power Analysis using Cohen's D effect to access if the sample size was enough

PowerBI and Streamlit

Design Effectiveness and Power Analysis

Duration Acessment: As showed on the line graph, 4 months were indeed enough to reach normalization.
Power Analysis on Completion Rate: The required sample size per group was 3926, well inferior to our sample sizes, which means it was enough for the experiment.

Overall the A/B testing was well structured with an homogenous traits (gender, balance, age) on both samples, with enough sizes and time to obtain tangible results!

Conclusion

The new user interface significantly improved the completion rate. The A/B test design was effective, and the sample size was sufficient to detect meaningful differences.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Visualizations		Visualizations
__pycache__		__pycache__
data		data
notebooks		notebooks
.DS_Store		.DS_Store
README.md		README.md
app.py		app.py
backend.py		backend.py
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vanguard Investment Project

Table of Contents

Project Overview

Hypotheses

1st Hypothesis: Completion Rate Analysis

2nd Hypothesis: Completion Rate with Cost-Effectiveness Threshold

3rd Hypothesis: Interaction Patterns

Data Sources

Data Preparation and Cleaning

Exploratory Data Analysis (EDA)

Visualizations

1. Proof of Concept (PoC) Diagram

2. Completion Rates (1st Hypothesis)

3. Completion Rate with Cost-Effectiveness Threshold (2nd Hypothesis)

4. Normalization Detection (3rd Hypothesis)

5. Interaction Pattern (3rd Hypothesis)

6. Duration Acessment

6. Power Analysis on Completion Rate

PowerBI and Streamlit

Design Effectiveness and Power Analysis

Conclusion

Project Management and Presentation

Contact

About

Releases

Packages

Languages

AlexRibeiro95/Vanguard-EDA

Folders and files

Latest commit

History

Repository files navigation

Vanguard Investment Project

Table of Contents

Project Overview

Hypotheses

1st Hypothesis: Completion Rate Analysis

2nd Hypothesis: Completion Rate with Cost-Effectiveness Threshold

3rd Hypothesis: Interaction Patterns

Data Sources

Data Preparation and Cleaning

Exploratory Data Analysis (EDA)

Visualizations

1. Proof of Concept (PoC) Diagram

2. Completion Rates (1st Hypothesis)

3. Completion Rate with Cost-Effectiveness Threshold (2nd Hypothesis)

4. Normalization Detection (3rd Hypothesis)

5. Interaction Pattern (3rd Hypothesis)

6. Duration Acessment

6. Power Analysis on Completion Rate

PowerBI and Streamlit

Design Effectiveness and Power Analysis

Conclusion

Project Management and Presentation

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages