This is part of the Kaggle Machine Learning competition where contestants are asked to predict the survival of the Titanic disaster using own choice of ML method(s). In this repository, I will explain how I succeeded to build predictive models which put me in the top 10% out of almost 20,000 contestants.
The following steps are used:
- Data visualization & pre-processing Here, all given datasets were described using descriptive statistical methods, missing data are treated with some kind of imputation, resulting data are visualized and analysed
- Feature analysis All features and correlation between features are described. The required datasets are made ready to build the predictive models
- Predictive analysis using machine learning techniques Selected machine learning techniques are explained, classifiers are trained based on the training dataset, the models performance are evaluated utilizing K-Fold cross validation based on the training dataset.
The best classifier are chosen based on the resulting performance analysis on step 3. This classifier are used to predict the survival from the test dataset.
Visit the following link to get a lite version of the report.