The primary objective of this project is to leverage Python libraries such as pandas, matplotlib, and seaborn to extract valuable insights from the data. Additionally, we aim to utilize xgboost and scikit-learn libraries for machine learning.
A secondary objective is to learn how to fine-tune the parameters using grid search cross-validation for the xgboost machine learning model.
Ultimately, the goal is to predict whether a loan applicant can repay the loan using voting ensemble techniques that combine predictions from multiple machine learning algorithms.
The dataset contains the following attributes: Loan ID, Gender, Marital Status, Dependents, Education, Self-Employment Status, Applicant Income, Coapplicant Income, Loan Amount, Credit History, Property Area, and Loan Status.
Income Trends: Male and married applicants tend to have higher incomes compared to female and married applicants, who have the lowest incomes.
Education Impact: Male graduates have higher incomes than non-graduates.
Marital and Educational Impact: Married graduates have the highest incomes among all groups.
Employment Status: Non-self-employed applicants have higher incomes than self-employed ones.
Dependents: Applicants with more dependents have the lowest incomes, while those with no dependents have the highest.
Property and Credit History: Applicants with property in urban areas and a credit history tend to have the highest incomes.
Education and Credit History: Graduates with a credit history earn more than those without.
Income and Loan Amount: Loan amounts are linearly dependent on applicant incomes.
Correlation: Heatmaps indicate a strong positive correlation between applicant income and loan amount.
Gender Distribution: There are more male applicants than female applicants.
Marital Status: More applicants are married than unmarried.
Dependents: The majority of applicants have no dependents.
Education: There are more graduates than non-graduates among the applicants.
Property Area: Most properties are located in semi-urban areas, with the least in rural areas.