Titanic - Machine Learning from Disaster

Projekt_mlynatom package provides data processing of Titanic - Machine Learning from Disaster data and prediction of survival by logistic regression and neural networks.

Installation

The package is not available from official repositories and can be installed with the following command.

(@v1.8) pkg> add https://github.com/B0B36JUL-FinalProjects-2022/Projekt_mlynatom

Description of data from kaggle.com

Data Dictionary

Variable	Definition	Key
survival	Survival	0 = No, 1 = Yes
pclass	Ticket class	1 = 1st, 2 = 2nd, 3 = 3rd
sex	Sex
Age	Age in years
sibsp	# of siblings / spouses aboard the Titanic
parch	# of parents / children aboard the Titanic
ticket	Ticket number
fare	Passenger fare
cabin	Cabin number
embarked	Port of Embarkation	C = Cherbourg, Q = Queenstown, S = Southampton

Variable Notes

pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower

age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)

parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.

Usage

This package focuses on two main parts: Data processing and prediction of survival. All functions are implemented in files in src folder. Example usage is shown in classify.jl.

Data processing

In this part (mainly in src/data_preparation.jl) are data processing functions provided. Main focus is on processing categorical values, filling missing values and creating new data from other.

categorical values are converted to dummy encoding
missing age, fare and embark are filled with data, see classify.jl.
from name column titles are separated and then used as new data

Prediction of survival

In this part are implemented 2 options. First is ridge logistic regression with normal/adam step and second are neural networks.

Neural networks are trained and defined using Flux library with custom training loop.
Ridge logistic regression is regularization method used to avoid overfitting.
Adam optimization algorithm is used for better (and faster than in normal gradient descent).

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
examples		examples
fig		fig
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic - Machine Learning from Disaster

Installation

Description of data from kaggle.com

Data Dictionary

Variable Notes

Usage

Data processing

Prediction of survival

About

Releases

Packages

Languages

License

B0B36JUL-FinalProjects-2022/Projekt_mlynatom

Folders and files

Latest commit

History

Repository files navigation

Titanic - Machine Learning from Disaster

Installation

Description of data from kaggle.com

Data Dictionary

Variable Notes

Usage

Data processing

Prediction of survival

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages