This project is our submission for Kavach Hackathon 2023 on Phishing Detection Solution, problem statement ID (KVH-004).
Design and develop a technological solution for AI-enabled Phishing Links Detection and Alert System. The solution should be able to identify the source of phishing attacks in web pages, email apps, social media, instant messenger apps, text messages etc. The solution may be in the form of a desktop/mobile application or a web browser plugin.
In today's day and age, data is the new oil. It is clear that the importance of information, and more specifically user information is of paramount significance. Personal data is any information that relates to us, our identity as an individual. It includes everything and anything that can be used to identify us directly or indirectly such as our name, address, date of birth, Aadhar number, PAN Card number, email address, phone number, financial information, health information, and more. With increasing attacks and breaches on organizations and users, the end user's data and information is at risk.
Phishing is one of the most common types of cyberattacks that take place every year. Here are some statistics to understand the gravity of the situation :-
- Phishing attacks are responsible for 90% of data breaches.
- The global average cost of a data breach caused by a phishing attack is $3.86 million.
- In 2020, there was a 22% increase in the number of phishing attacks compared to the previous year.
- One in every 99 emails is a phishing attack.
- Phishing emails account for 80% of all reported security incidents.
- 94% of malware is delivered via email.
Therefore, a solution to detect such link phishing links for the user beforehand is much needed.
Since, the increase in phishing attacks are increasing day-by-day manual identification and alerting of phishing links is not feasible. Therefore, we employ machine learning techniques to automate this process.
For training our model, we use the Malicious URLs dataset
- Loading the data - To work with the data.
- Familiarizing with data & EDA (Exploratory Data Analysis) - We perform EDA to understand the underlying the structure of data.
- Visualizing the data - We perform certain data visualization techniques to visualize the data to realize the important correlations between different features.
- Building and training the model - We trained the following models :
- Logistic Regression
- K-Nearest Neighbors
- Support Vector Clasifier
- Naive Bayes
- Decision Tree
- Random Forest
- Gradient Boosting
- Catboost
- Multilayer Perceptrons
Based on the latency of the model response as well as accuracy Gradient Boosting demonstrated to strike the best balance. Thus, being the model of our choice.
- Creating manifest.json with configurations on when it will activate and what resources it will be able to access from our access folder.
- background.js is responsible for fetching the URLs and ensuring that the extension works as intended.
- The contentScript.js file manipulates the DOM of the site in question to seamlessly integrate itself onto it.
- The popup HTML, JavaScript and CSS are responsible for the user interface of the extension.
- The model is saved as a pickle.
- This pickle is served in Flask as a micro backend framework.
- The flask server then exposes the model so that it can be fed with input, while also enabling us to provide the predicted output into our extension.
- The CRX file is available here. You may choose to download the file and place it in an appropriate folder to run the extension. Some links to navigate through the extension are :-
- Using the repository
- Clone the repository using the command -
git clone https://github.com/shshwtsrkr/Phishing-attack-detection.git
- Goto Extensions and enable Developer mode in your browser.
- Click on load unpacked and then open the cloned repository.
- Clone the repository using the command -