- Project Overview
- Hypotheses
- Data Sources
- Data Preparation and Cleaning
- Exploratory Data Analysis (EDA)
- Visualizations
- Design Effectiveness and Power Analysis
- Conclusion
- Project Management and Presentation
- Contact
- This project aims to analyze client behavior and key performance metrics for Vanguard's online investment process. The goal is to determine if the new user interface (UI) leads to a higher completion rate compared to the traditional UI through A/B testing.
- Null Hypothesis (H0): There is no significant difference in completion rates between the Test and Control groups.
- Alternative Hypothesis (H1): The completion rate is significantly higher in the Test group compared to the Control group.
- Null Hypothesis (H0): The increase in completion rate is less than 5%.
- Alternative Hypothesis (H1): The increase in completion rate is at least 5%.
- Null Hypothesis (H0): There is no difference in the number of actions taken between the Test and Control groups.
- Alternative Hypothesis (H1): There is a significant difference in the number of actions taken between the Test and Control groups.
df_final_demo.csv
: Client demographic data.df_final_web_data_pt_1.csv
: Web interaction data for the first period.df_final_web_data_pt_2.csv
: Web interaction data for the second period.df_final_experiment_clients.csv
: Information on clients involved in the experiment.
- Loading Data: Load and inspect datasets.
- Univariate and Bivariate Analysis: Perform initial analysis to understand variables relation, and outlier detection.
- Cleaning Data: Handle missing values, remove duplicates, and address outliers.
- Merging Datasets: Merge datasets to create a comprehensive dataset for analysis.
- Demographic Analysis: Analyze client demographics (age, gender, tenure, balance).
- Behavior Analysis: Study client behavior based on web interaction data.
- High-Value Clients: Identified primary clients based on the A/B testing, and before cleaning identified high value clients (with outliers).
- High-level overview of the project methodology, from data collection to analysis and conclusions.
- Comparison of completion rates between the Test and Control groups.
- Completion rate based on a Cost-effectiveness treshold.
- Histogram and Q-Qplot after removing outliers for better normalization. Shapiro-Wilk test of 0.94 and 0.95 (Normal Distribution)
- Bar plot showing average actions required to complete the process per group
- Line graph showing normalization after 4 months, showing that duration of the experiment was enough
- Power Analysis using Cohen's D effect to access if the sample size was enough
- Duration Acessment: As showed on the line graph, 4 months were indeed enough to reach normalization.
- Power Analysis on Completion Rate: The required sample size per group was 3926, well inferior to our sample sizes, which means it was enough for the experiment.
- Overall the A/B testing was well structured with an homogenous traits (gender, balance, age) on both samples, with enough sizes and time to obtain tangible results!
- The new user interface significantly improved the completion rate. The A/B test design was effective, and the sample size was sufficient to detect meaningful differences.