Crime Category Prediction Challenge
In this project, I tackled the complex problem of predicting crime categories using a detailed dataset of crime incidents. The project involved several key steps:
- Data Preprocessing: Handled missing values, encoded categorical features, and normalized numerical data to prepare the dataset for modeling.
- Exploratory Data Analysis (EDA): Uncovered patterns and trends within the dataset, such as temporal patterns and geographical distributions.
- Feature Engineering: Created new features to capture significant aspects of the data, including time-based features and encoded categorical variables.
- Model Training and Optimization: Employed various machine learning models including Logistic Regression, Random Forest, XGBoost, and LightGBM. Used grid search and cross-validation to optimize model performance.
- Evaluation: Achieved a notable accuracy of 95.7% on the test set, demonstrating the effectiveness of the feature engineering and model selection processes.
- Tools and Libraries: Utilized Scikit-Learn for model implementation, Pandas and NumPy for data manipulation, and Matplotlib and Seaborn for data visualization.