I'm a master's student in AI focusing on Machine Learning and Natural Language Processing.
I'm passionate about constantly improving myself in the fields of Data Science and Machine learning with the aim of bringing the most effective solutions to different types of business related real-world problems.
During my master's education, I realized I would enjoy leveraging data analytics and ML tools to drive business impact in BI environments. Therefore, besides my studies, I'm using the well-known BI tools of PowerBI and Tableau to gain insights from data and help a given business in a decision making process.
Languages
Data Analytics & ETL
Cloud Technologies
Machine Learning & NLP
IDEs & Notebooks
Other Technologies & Tools
This section is divided into two parts; 'Data Analysis' and 'Machine Learning & NLP'projects. You can click the given links to look at the details of the projects further.
-
Analyzed the dataset about video game sales for different platforms between 1980s-2010s. The dataset was taken from Kaggle, which was scraped from vgchartz. First, using Google Colab, the data was cleaned through using pandas and then with matplotlib and seaborn, preliminary insights were gained. Finally, the cleaned data was sent to Tableau for further, in-depth visualizations.
-
California Infectious Diseases Analysis
Analyzed the dataset about selected communicable infectious diseases reported in the state of California between 2001-2022. The dataset was taken from the official website of Data.gov. Through using pandas and matplotlib, the trends of common diseases throughout the giiven years was looked at and additionally, the distribution of them by gender and counties were considered.
-
Data analysis project to gain insight into how the crime rates fluctuated in Chicago between 2001-2024. Python libraries of pandas, numpy, matplotlib and seaborn are used to achieve this. Additionally, the most common types of crimes and the locations they occur the most are considered. The dataset was taken from Data.gov
-
Connecticut Real Estate Analysis
Analyzed the real estate sales in the state of Connecticut between the years of 2001-2022 using Python and its libraries of pandas, numpy and matplotlib. Through analysis, the main goal was to discover the general trend of the sales amount, what type of residence types are in demand the most, the expense of the cities and relatedly, which locations are over and undervalued.
-
Amsterdam Airbnb Data Analysis
The Airbnb market analysis of Amsterdam. The data was taken from Inside Airbnb. To analyze the corresponding CSV files, Microsoft PowerBI was used. During the analysis, the main goals were to discover the trend of total listings throughout the years, to compare the neighborhoods in terms of price, and to discover the most popular hosts.
-
Turkish Sentiment Analyser - Hugging Face - Web App
Fine-tuned the distilled Turkish BERT model on a review classification dataset for sentiment analysis. The final model achieved 86% accuracy and was deployed to Hugging Face Spaces using Streamlit as an interactive web app. The app provides a no-code way for people to see whether a particular review is "positive" or "negative".
-
Toxic Comment Detector - Web App
Binary classification project to predict whether a comment is toxic or not. Three machine learning models of Multinomial Naive Bayes, Logistic Regression, and Support Vector Machine were used. The best model was a Naive Bayes classifier with TF-IDF Vectorizer with the F1 and Recall scores of 0,85 and 0,88, respectively. The application uses this model to predict the toxicity of comments.
-
cst5 is a tiny T5 model for the Czech language that is based on the smaller version of Google's mT5 model. cst5 is meant to help people in doing experiments for the Czech language by enabling them to use a lightweight model, rather than the 101 languages-covering massive mT5. cst5 was obtained by retaining only the Czech and English embeddings of the mT5 model, during which the total size was reduced from 2.2GB to 0.9GB as a result of shrinking the original "sentencepiece" vocabulary from 250K to 30K tokens and parameters from 582M to 244M. cst5, thus, allows people to do fine-tuning for further downstream tasks in the Czech language with less size requirement and without any loss in quality from the original multilingual model.
-
Financial Sentiment Analysis with Machine Learning, LSTM, and BERT Transformer
Financial sentiment analysis project to predict if a given financial text is to be considered as positive, negative or neutral. Machine learning, LSTM, and BERT transformer were used during the process. The best result was obtained with BERT. It achieved the accuracy score of 0.77.