Skip to content

Derived a Set of real world data science insights completed using the Python Pandas library

Notifications You must be signed in to change notification settings

The-alpha-male/Sales_Data_Insights

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Sales_Data_Insights

I used Python Pandas & Matplotlib to analyze and answer business questions about 12 months' sales data. The data contains hundreds of thousands of electronics store purchases broken down by month, product type, cost, purchase address, etc.

I started by cleaning the data. Tasks during this section include:

  • Drop NaN values from DataFrame
  • Removing rows based on a condition
  • Change the type of columns (to_numeric, to_datetime, astype)

Once I cleaned up the data, I moved to the data exploration section. In this section, I explored 5 high-level business questions related to the data:

  • What was the best month for sales? How much was earned that month?
  • What city sold the most product?
  • What time should one display advertisements to maximize the likelihood of customers buying the product?
  • What products are most often sold together?
  • What product sold the most? Why do you think it sold the most?

To answer these questions I walked through many different pandas & matplotlib methods. They include:

  • Concatenating multiple csvs together to create a new DataFrame (pd.concat)
  • Adding columns
  • Parsing cells as strings to make new columns (.str)
  • Using the .apply() method
  • Using groupby to perform aggregate analysis
  • Plotting bar charts and lines graphs to visualize our results
  • Labeling graphs.

About

Derived a Set of real world data science insights completed using the Python Pandas library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.4%
  • Python 0.6%