This project analyzes the top 100 highest-rated movies of the past decade using Python to uncover valuable insights about movies, actors, voters, ratings, and box office collections. The project is performed in a Jupyter Notebook for easy readability, providing code, visualizations, and key findings.
- Load and inspect the dataset.
- Understand the data dictionary for column meanings.
- Handle missing values.
- Correct data inconsistencies.
- Prepare the dataset for analysis.
- Analyze the distribution of ratings and collections.
- Find the top-rated actors and movies.
- Investigate voter demographics and their preferences.
- Create meaningful plots for interpreting the results.
- Use subplots for effective comparison where needed.
- Draw actionable conclusions from the analysis.
- Highlight interesting patterns and trends in the data.
- Programming Language: Python π
- Libraries Used:
- Pandas for data manipulation
- Matplotlib and Seaborn for data visualization
- NumPy for numerical operations
- Movie Assignment.ipynb: The Jupyter Notebook containing the Python code, visualizations, and inferences.
- movies_dataset.csv: The dataset used for analysis (if allowed to include).
- README.md: Documentation for the repository.
- Identified trends in ratings and collections over the past decade.
- Highlighted the most influential actors and their performances.
- Explored how voter demographics influence movie ratings.
- Uncovered correlations between box office collections and ratings.
This project provides an in-depth analysis of the top 100 rated movies over the past decade, uncovering key insights into the movie industry, its ratings, and the influence of actors and voters. Through data exploration, cleaning, and visualization, the project delivers actionable insights and meaningful trends that can guide future decisions in the movie industry.
Feel free to explore the notebook and run the code to dive deeper into the analysis!