SpaceShip-Titanic

Machine Learning Project

This is a competiton dataset from kaggle: https://www.kaggle.com/competitions/spaceship-titanic

Steps Involved:

Handling String Data:

HomePlanet Column has namely 3 different values

Europa-Earth-Mars respectively each were filled with 0-1-2

Similarly

Destination has namely 3 different values

TRAPPIST-1e-PSO J318.5-22-55 Cancri e each were filled with 0-1-2

CryoSleep has bool values which were changed to 0's and 1's

also VIP and Transported has bool values which were changed to 0's and 1's

Similary for handling the Cabin info

first we can see that from data visulization that the first character and last character in the string was significant so made a list made up of those data and found how many different values were there and their values were given proper numerical values.

Handling Missing Data:

The Columns With Missing Datas Were The Following:

1.HomePlanet 2.CryoSleep 3.Cabin 4.Destination 5.Age 6.VIP 7.RoomService 8.FoodCourt 9.ShoppingMall 10.Spa 11.VRDeck 12.Name

we have dropped Name since it doesn't have significant importanace and has missing datas also.

now first we fill the Age with mean of Age value from the column

and when we plot a barplot between VIP and RoomService we can see a correlation between them

that is when people spend more than 300 are VIP and less than 300 are not VIP

so for the missing datas we used this method to fill data

Similarly when we plot VIP and HomePlanet we can see

when VIP == 0 it is planet 2 or else 1

we will drop all other nan values since couldnt further handle it.

Do the same for above for test.csv also

Choose model to predict:

Here I have used a system to select which model will be better for this case and the models I used are:

'xgboost' : xgb.XGBClassifier(), 'lightgbm' : lgb.LGBMClassifier(), 'gradient boosing' : GradientBoostingClassifier(), 'random forest' : RandomForestClassifier(), 'logistic regression': LogisticRegression(), 'naive bayes': GaussianNB(),

we have found that xgboost, lightgbm and gradient boosting were better than others so try these three

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
Titanic - Machine Learning from Disaster.py		Titanic - Machine Learning from Disaster.py
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpaceShip-Titanic

About

Releases

Packages

Languages

License

Thiruvikraman07/SpaceShip-Titanic

Folders and files

Latest commit

History

Repository files navigation

SpaceShip-Titanic

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages