This work improves phishing detection on local websites in these countries
- Czech Republic (CZ)
- Denmark (DK)
- Estonia (EE)
- Croatia (HR)
- Hungary (HU)
- Lithuania (LT)
- Latvia (LV)
- Poland (PL)
- Romania (RO)
- Serbia (RS)
- Slovakia (SK)
- Slovenia (SI)
-
Clone the repository
-
Install Jupyter notebook
-
Run notebook phishing-detection-language.ipynb
phishing-detection-language.ipynb - Python source code
result-raw - detailed phishing detection outcomes for each URL, organized by country
result-figures - confusion matrices, word clouds, and analyses of false positives, categorized by country
result-improved - webpages with refined phishing predictions
result-reports - full phishing detection reports for each country
benign-urls/urls-*country - 2 million benign webpage URLs, categorized by country
benign-urls/urls-GENERIC - 1 million generic webpage URLs
phishing-urls - 1 million phishing webpage URLs
Benign URLs from Common Crawl
Phishing URLs from Phishing.Database