Socio-economic factors associated with the number of suicides in the world"

Python-based analysis project on topic "Socio-economic factors associated with the number of suicides in the world"

Project structure

In the datasets folder you will find the datasets used in the study
Data preprocessing information is represented in the jupyter notebook by markdown entries.
The "models" folder contains saved trained ML models, data for them and a dictionary for decoding categorical variables.

Intro

This project was inspired by preventing or, at least, minimizing suicide rates in the world and was done as the final project on the HSE university discipline "Data Analysis on Python"
Presentation, full version

Methods and python packages applied

Data processing (pandas)
Interactive plots (plotly)
Static plots (matplotlib)
Statistic criteria (scipy.stats)
Linear regression and multiple comparison (statsmodels)
Machine learning (Decision Tree model and validation) (sklearn)
Built model serialization (pickle)

Datasets

Three datasets are used, joined by country and year of observation:

Suicide Rates Overview (1985 to 2021) - main dataset
From this dataset suicides_per_100k as target variable is used.
Global Trends in Mental Health Disorder
Variables of different mental disorders are used, mainly depression, alcoholism and rates.
Inflation, Interest and Unemployment Rate Mainly unemployment and inflation prices rates are used.

Stated hypotheses and results

The suicide rate differs statistically significantly across age groups. ✅
The suicide rate differs statistically significantly across generational groups. ✅
The suicide rate differs statistically significantly by gender. ✅
The suicide rate differs statistically significantly across wealth groups in the country. ✅
The suicide rate is negatively statistically significantly associated with the human development index (HDI) ⁉️
The suicide rate is positively statistically significantly associated with the rates of psychiatric disorders in the country. ✅
The level of GDP per capita is negatively statistically significantly associated with the suicide rate and with the rates of psychiatric disorders in the country. ❌
The suicide rate in rich countries is greater than or equal to that in poor countries. ✅
The suicide rate is positively statistically significantly associated with inflation and unemployment rates. ⁉️

Built regression model and its results

$R^2$ - the proportion of the variation in the dependent variable that is predictable from the independent variable(s)
$RMSE$ - root mean square error, basic regression metric, showing error in absolute values
$RMSLE$ - Root Mean Squared Logarithmic Error, used in model validation for decreasing the impact of outliers in target variable
min_samples_split - The minimum number of samples required to split an internal node
max_depth - The maximum depth of the tree

Halving search with grid of parameters in model fitting was used.

Metrics of final regression model:

Predictors	$R^2$	$RMSE$	$RMSLE$	min_samples_split	max_depth
age, generation, gender, country income level, alcoholism rate, depression rate	0.86158	1.10141	0.30066	103	20

Final model explains 86% of the variability in the data, predicts the magnitude of the suicide rate (per 100,000 population) with an absolute mean error of 1.1 points.

The inclusion of variables whose relationship with the target variable was statistically confirmed (partiularly in case of alcoholism and depression rates) favorably influenced the predictive power of the model.

generation predictor values:


Generation Z	1997-2012
Millennials	1981-1996
Generation X	1965-1980
Boomers	1946-1964
Silent	1928-1945
G.I. Generation	1901-1927

Summarizing

Project goals were achieved:
- Statistically significant socio-economic factors have been found that affect the rate of suicide (per 100,000 population)
- A model has been built that predicts the value of the target variable with high accuracy
Interesting observations made:
- Men are the most risky group.
- Younger social groups have a lower risk of suicide.
- Rates of psychiatric abnormalities help improve the prognosis of the suicide rate.
A basis for further research has been obtained: a detailed analysis of each of the divisions of observations is possible - by age, sex and generations.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README_resources		README_resources
datasets		datasets
models		models
LICENSE		LICENSE
Project.ipynb		Project.ipynb
README.md		README.md
Suicide Research - Brief version.pptx		Suicide Research - Brief version.pptx
Suicide Research - Full version.pptx		Suicide Research - Full version.pptx
Suicide rates research project - Speech.docx		Suicide rates research project - Speech.docx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Socio-economic factors associated with the number of suicides in the world"

Project structure

Intro

Methods and python packages applied

Datasets

Stated hypotheses and results

Built regression model and its results

Summarizing

About

Releases

Packages

Languages

License

englishtea21/suicide-research

Folders and files

Latest commit

History

Repository files navigation

Socio-economic factors associated with the number of suicides in the world"

Project structure

Intro

Methods and python packages applied

Datasets

Stated hypotheses and results

Built regression model and its results

Summarizing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages