Check it out here for a detailed and comprehensive review of this project- https://medium.com/@vaibhavshukla182/pump-it-up-data-mining-the-water-table-f903d4cfc7a8
‘Pump It Up’ is a competition organized by Drivendata and Tanzanian Ministry of water has raised same problem. A training dataset with information of nearly sixty thousand water points across Tanzania is provided and it is expected to build a model which will be able to predict which water points are functional, nonfunctional and functional but need repair on a test dataset.
As simple as possible visualisation of data in python, trying to explain the meaning of different features and the relationship between different features through plots and charts with very clear visible relationship between some features and target variable -’status_group’.
During cleaning of data I have removed some features which were similar to each other and made one more new feature.
Evaluation of the model is mainly done through two classifiers- 1)Random Forest Classifier 2)XGboost
Of which RFC proved to be the better one after trying various forms of both the classifiers.
Yours suggestions on improving my model are mostly humbly welcome.