Skip to content

This repository contains a series of projects from my bootcamp and examples of data analysis using Python. In this repository, we explore various data analysis techniques and tools, focusing on the most popular and powerful Python libraries.

Notifications You must be signed in to change notification settings

KnEl1a/Data-Analysis-with-Python

Repository files navigation

Data Analysis

My certificate from the Programming BootCamp: FreeCodeCamp

pexels-pixabay-265087.jpg

Technologies Used: Python 3.11 - Jupyter Lab (IDE) for testing - VSCode (IDE) for structuring each algorithm.

Libraries: pandas, numpy, scikit-learn, matplotlib, and seaborn.

Project Descriptions:

Sea Level Predictor

The code loads historical sea level data from a CSV file, performs two linear regressions to show trends from 1880 and from 2000 respectively, creates a scatter plot with lines of best fit, and saves the plot as "sea_level_plot.png". It is useful for analyzing and visualizing how sea levels have changed over time. SeaLevel

Mean Variance

For the calculate function in the mean_var_std.py file: The calculate function performs basic statistical calculations on a provided list of nine numbers. It uses the NumPy library to calculate the mean, variance, standard deviation, maximum, minimum, and sum of the numbers in the list. The function returns a dictionary containing these results, organized by statistical category.

Demographic Analyzer

For the calculate_demographic_data function in the demographic_data_analyzer.py file: The calculate_demographic_data function processes demographic data contained in a CSV file and calculates various demographic statistics, such as the count of represented races, the average age of men, the percentage of people with university degrees, and the percentage of people earning more than 50 thousand dollars a year, among others. It uses the Pandas library for manipulating and analyzing tabular data in Python.

Medical Data Analyzer

For the draw_cat_plot and draw_heat_map functions in the medical_data_visualizer.py file: draw_cat_plot: This function uses the Seaborn library to create a categorical bar plot showing the distribution of various characteristics, such as cholesterol, glucose, smoking habits, alcohol consumption, physical activity, and overweight, divided by the presence or absence of cardiovascular diseases. The function saves the generated plot in a file named 'catplot.png'.

catPlot

draw_heat_map: This function uses Seaborn to generate a heatmap that shows the correlation between various medical characteristics. The heatmap visually represents the relationships between variables, helping to identify patterns and associations in the data. The function saves the generated heatmap in a file named 'heatmap.png'.

heatMap

Page-View

For the draw_line_plot, draw_bar_plot, and draw_box_plot functions in the time_series_visualizer.py file:

draw_line_plot: This function uses Matplotlib to create a line plot showing the daily page views of the freeCodeCamp forum over a specific time period. The plot represents the temporal trend of page views over time. The function saves the generated plot in a file named 'line_plot.png'.

Lineplot

draw_bar_plot: This function uses Matplotlib and Pandas to create a bar plot showing the monthly average page views of the freeCodeCamp forum over several years. The bar plot provides a visualization of the average number of page views for each month over time. The function saves the generated plot in a file named 'bar_plot.png'.

barPlot

draw_box_plot: This function uses Seaborn and Pandas to create two box plots showing the distribution of page views of the freeCodeCamp forum based on year and month. The box plots provide information on the trend and seasonality of page views over time. The function saves the generated box plots in a file named 'box_plot.png'.

boxPlot

Final Conclusion

Throughout this process of data analysis and visualization, I have had the opportunity to apply a variety of tools and techniques to extract valuable information from diverse datasets. From basic statistical calculations to exploring correlations and temporal trends, each project has significantly contributed to my understanding and experience in the field of data analysis using Python.

Through the certification in the FreeCodeCamp Programming BootCamp, I have consolidated my skills in using key libraries such as pandas, numpy, scikit-learn, matplotlib, and seaborn.

From calculating mean, variance, and standard deviation in the Mean Variance project, to detailed demographic analysis in Demographic Analyzer, and exploring correlation and distribution in Medical Data Analyzer and Page-View, each project has been an invaluable opportunity to apply and improve my technical skills and understanding of fundamental concepts in data analysis.

About

This repository contains a series of projects from my bootcamp and examples of data analysis using Python. In this repository, we explore various data analysis techniques and tools, focusing on the most popular and powerful Python libraries.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages