pandas generally performs better than numpy for 500K rows or more. for 50K to 500K rows, it is a toss up between pandas and numpy depending on the kind of operation.
- In Class Instruction: 4 Hours
- In Class code along Dataset: Weather Dataset
- Project Dataset: Indian Premier League
- Estimated Time to complete Project Tasks: 1 Hours
- Total sub tasks within the Project: 6
- Complexity of sub tasks : Mid to High
- Points to be scored : 700
- Why should you care about this project: This project challenges you to manipulate large datasets without using conventional programming techniques to extract business insights.
- Skills Rehearsed
- Python | Numpy | Pandas
- Instructor led concept onboarding
- Code Alongs
- In Class Quiz Administration
- Periodic Recap - Closer to the end of session
- In Class Assignments - Motivation
- Take Away Assignments
Learn about the Pandas Series & DataFrame, the de facto standard to work with tabular data in Python. You will get hands-on practice with creating, manipulating and accessing the information you need from these data structures.
After this lesson, you'll be able to
- Selection, Indexing and Filters
- Filters
- Introduction to Pandas - Pandas Series
Check the Jupyter Notebook in the top right of the screen
In IPL teams representing Indian cities contend each year. Chris Gayle is the highest run scorer in IPL. Do you know who is the second highest run scorer (without using ‘for’ loop)? This module can help you determine the second highest run scorer by manipulating large data sets to extract business insights.
This project challenges you to manipulate large datasets without using conventional programming techniques to extract business insights.