This project explores various clustering techniques and supervised learning applied to the analysis of team performance in the World Cup. The methodologies covered include K-Means, DBSCAN, K-Nearest Neighbors, Gaussian Mixture Models (GMM), and Agglomerative Clustering.
The dataset used in this project contains information such as:
- Position: Team's ranking position
- Team: Name of the team
- Games Played: Total number of games played
- Win: Total number of wins
- Draw: Total number of draws
- Loss: Total number of losses
- Goals For: Total goals scored by the team
- Goals Against: Total goals conceded by the team
- Goal Difference: Difference between goals scored and conceded
- Points: Total points accumulated
- Year: Year of the competition
The main objective of this project is to apply clustering techniques to gain a better understanding of the data structure and the relationships among the variables. We aim to identify groups of similar teams, effectively segment the data, and evaluate the performance of machine learning algorithms in different scenarios, with an emphasis on teaching unsupervised learning techniques.
- Python
- Jupyter Notebook
- Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, among others.
-
Clone the repository to your local machine:
git clone https://github.com/cyblx/clustering.git
-
Install the required libraries:
pip install -r requirements.txt
-
Open Jupyter Notebook and run the analysis:
jupyter notebook
-
Follow the instructions within the notebook to explore the dataset and view the analysis results.
For more information, codes, tutorials, and exciting projects, visit the links below:
- Email: alves_lucasoliveira@usp.br
- GitHub: cyblx
- LinkedIn: Cyblx