🏭 Web Scraping: largest chemical producers worldwide

This repository contains an end to end data analysis project on the largest chemical companies in the world using Python.

Table of content

Introduction

This data analysis project is focused on web scraping Wikipedia data using Python.

The purpose of this work is to extract relevant data from Wikipedia to obtain meaningful information about the largest chemical producing companies in the world in the year 2021.

In this project, a general analysis of chemical companies is performed using Python in order to discover valuable information. The analysis workflow includes essential steps such as data extraction, data cleaning, data exploration, data analysis and data visualization.

Goal

The overall objective of this project is to scrap from the Wikipedia website all relevant information on the largest chemical producers by sales in the year 2021. Once the data has been collected, it will be preprocessed to undergo exploration and analysis in order to later visualize the significant results obtained.

Through the data analysis, it is expected to find answers to the following points:

Best-selling chemical companies in the world
Fastest growing companies in the world
Countries with the largest amount of successful chemical companies in the world
Best-selling Geman chemical companies
Fastest growing German chemical companies

Project overview

Web scrape data from Wikipedia using Python
Perform data preprocessing to clean and prepare the scraped data
Data explotarion to find missing values, outliers, anomalies or patterns
Data visualization to communicate the information obtained during the analysis
Conclusion

Dependencies

The following tools are necessary to carry out this project:

Python 3
Jupyter Notebooks
Python libraries:
- BeautifulSoup
- Requests
- Numpy
- Pandas
- Matplotlib.pyplot
- Seaborn

Technical skills

The following skills were used throughout the implementation of this project:

Web scraping
Data cleaning
Data exploration
Data visualization

Data set

Data collection was achived by means of scraping techniques using the BeautifulSoup Python library.

The data set consists of:

50 entries
5 columns

Data source: Wikipedia

Data cleaning

To ensure the integrity and reliability of the data obtained by scraping the Wikipedia website, it was necessary to clean it.

To this end, column names were modified to make them more informative and improve readability; numeric values were transformed from string to integer and float type and underwent minor changes to conform to the metric decimal system and data that were transferred with errors were also corrected to ensure their reliability.

Data exploration

To ensure that the data obtained from the analysis is accurate and reliable it is essential to handle duplicate values, missing values, outliers, as well as to find inconsistencies and patterns in the data.

No missing, duplicate or out-of-range values were found in this data set. The incorrectly transferred values were fixed in the previous step and no more errors were found. In addition, a new feature was created from the “Headquarters” column to obtain more precise information on the country of origin of the companies.

Data visualization

Data visualization plays a crucial role in data analysis, as it is the stage at which the conclusions drawn from the analysis are effectively communicated.

This stage focuses on creating visual representations of the insights gained during the analysis. The Python libraries Matplotlib and Seaborn were used for this purpose.

Conclusion

In conclusion, this project successfully analyzed the world's leading chemical producing companies using Python.

As expected from this project, the following questions were answered:

Best-selling chemical companies in the world

Fastest growing companies in the world

Countries with the largest amount of successful chemical companies in the world

Best-selling Geman chemical companies

Fastest growing German chemical companies

Looking ahead, there are several areas that are worthy of further exploration:

Develop advanced predictive models: Use machine learning algorithms to build predictive models that can accurately assess annual growth and revenue for these chemical companies.
Expand the data set: Increase the robustness of the analysis by incorporating a larger and more diverse data set.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
Web-scraping-largest-chemical-producers.ipynb		Web-scraping-largest-chemical-producers.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏭 Web Scraping: largest chemical producers worldwide

Table of content

Introduction

Goal

Project overview

Dependencies

Technical skills

Data set

Data cleaning

Data exploration

Data visualization

Conclusion

About

Languages

herrerovir/Python-web-scraping-chemical-producers

Folders and files

Latest commit

History

Repository files navigation

🏭 Web Scraping: largest chemical producers worldwide

Table of content

Introduction

Goal

Project overview

Dependencies

Technical skills

Data set

Data cleaning

Data exploration

Data visualization

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Languages