This project analyzes biodiversity data from the National Parks Service, focusing on species observed across various parks. Conducted in Jupyter Notebook, the analysis utilizes pandas, NumPy, matplotlib, seaborn, and chi2_contingency for data processing, statistical testing, and visualization.
The goal is to explore patterns in conservation status, endangered species, and species distribution across parks.
- What is the distribution of conservation status for species?
- Are certain types of species more likely to be endangered?
- Are the differences between species and their conservation status statistically significant?
- Which animal is most prevalent, and how is their distribution spread across parks?
The project is based on two datasets provided by Codecademy:
species_info.csv
: Contains species classification, scientific names, and conservation status.observations.csv
: Records species sightings across national parks in the past seven days.
This dataset is inspired by real-world data and helps in understanding biodiversity trends across different regions.
- Python
- Jupyter Notebook
- pandas (Data processing & analysis)
- NumPy (Numerical computation)
- matplotlib & seaborn (Data visualization)
- chi2_contingency (Statistical testing)
- The
species_info.csv
dataset contains 5,824 rows and 4 columns (Category, Scientific Name, Common Names, Conservation Status). - The
observations.csv
dataset contains 23,296 rows and 3 columns (Scientific Name, Park Name, Observations).
- The majority of species are not under conservation efforts, while a small subset falls under categories such as Endangered, Threatened, and Species of Concern.
- A pie chart was created to visualize the distribution of conservation status.
- The dataset contains 7 unique categories: Mammals, Birds, Reptiles, Amphibians, Fish, Vascular Plants, and Nonvascular Plants.
- Mammals and Birds were found to have the highest rate of species under protection.
- Chi-squared tests were performed to analyze differences in conservation status.
- Results show that mammals and reptiles exhibit a statistically significant difference in conservation status, while mammals and birds do not.
- The Deep-Root Clubmoss had the highest number of sightings, while the Golden Corydalis was the least observed species.
- Bats were identified as the most frequently occurring mammals, with Yellowstone National Park having the highest bat sightings.
- The dataset includes data from four national parks:
- Great Smoky Mountains
- Yosemite
- Bryce Canyon
- Yellowstone
- The number of observed species was artificially identical across all parks, suggesting synthetic data generation.
- Protected species sightings were higher than non-protected species in most parks, except for the Great Smoky Mountains National Park.
- Bar charts comparing species counts across parks.
- Stacked bar chart showing species categories under conservation.
- Pie charts visualizing conservation status distribution.
- Statistical plots comparing endangered species proportions.
✔ Most species in the dataset are not under conservation efforts.
✔ Mammals and birds have the highest percentage of protected species.
✔ There is a significant difference in conservation status between mammals and reptiles.
✔ Bats are the most commonly observed mammals, especially in Yellowstone National Park.
✔ The identical species count across parks suggests that the data is artificially generated.
🔹 Incorporate a time-based dataset to analyze trends in species conservation over the years.
🔹 Include national park area size to assess biodiversity density.
🔹 Conduct spatial analysis to identify clustered distributions of species.
- Clone the repository:
git clone https://github.com/Sabdikay/Analysis-of-Biodiversity.git cd Biodiversity-Analysis
- Install required dependencies:
pip install pandas numpy matplotlib seaborn scipy
- Run the Jupyter Notebook:
jupyter notebook
- Open and execute the analysis notebook.
This project successfully analyzed biodiversity data, highlighting conservation trends and species distribution patterns across national parks. Through statistical testing and visualizations, I uncovered key insights about endangered species and national park ecosystems.
This analysis can serve as a foundation for more in-depth studies on biodiversity, wildlife conservation, and ecological trends in protected areas.
📊 Explore the project and contribute to further research! 🚀