This is the problem of a Shinkansen (Bullet-Trains) of Japan. They aim to determine the relative importance of each parameter with regards to their contribution to the passenger travel experience. Provided is a random sample of individuals who travelled using their train. The on-time performance of the trains along with the passenger’s information is published in the CSV file named ‘Traveldata_train’. These passengers were later asked to provide their feedback on various parameters related to the travel along with their overall experience. These collected details are made available in the survey report CSV labelled ‘Surveydata_train’.
In the survey, a passenger was explicitly asked whether they were delighted with their overall travel experience and that is captured in the data of the survey report under the variable labelled ‘Overall_Experience’.
The objective of this exercise is to understand which parameters play an important role in swaying passenger feedback towards a positive scale. You are provided test data containing Travel data and Survey data of passengers. Both the test data and the train data are collected at the same time and belongs to the same company.
Python, NumPy, Pandas, EDA, Seaborn, Matplotlib, Scikit-learn
Identify key factors that contribute to a positive passenger experience.
Data cleaning, EDA, Feature engineering, Predictive modeling (Linear Regression, Logistic Regression, Gaussian NB, ADA Classifier, Gradient Boosting, Bagging Boosting, Random Forest, Decision Tree), Evaluation
Identification of important factors for improving the overall passenger experience.