Comparison of Naïve Bayes and Support Vector Machine Methods in Classifying 22 Regional Languages

Indonesia boasts a rich cultural diversity, encompassing over 1300 ethnic groups and 2500 regional languages. The challenge arises due to the multitude of regional languages in Indonesia, making language identification in textual form difficult. This research compares Machine Learning methods for classifying 22 regional languages in Indonesia, aiming to provide a deep understanding of the relative performance of each method. The study successfully addresses the primary difficulty, which is the identification of regional languages in Indonesia. The main constraint of this research lies in the complexity of regional languages in Indonesia, with various characteristics, variations in grammar, and differing sentence structures, resulting in accuracy not yet reaching perfection. This factor opens opportunities for future research through parameter optimization or exploration of alternative methods. Evaluation results indicate that the Support Vector Machine achieves the highest accuracy, reaching 89.41%, making it the preferred choice for model implementation. Although Nae Bayes yields good results with an accuracy of 82.08%, Support Vector Machine remains the preferred option. The application of the model using Streamlit demonstrates the effectiveness of the Support Vector Machine in accurately predicting Javanese song lyrics. This research has the potential to assist users in identifying regional languages based on text and contributes significantly to understanding Machine Learning methods for classifying regional language texts. Despite its limitations, this study can be extended to other regional languages, enhancing model accuracy through parameter improvements.

📁 Dataset Information

The data used is taken from Korpus Nusantara.

📷 Screenshot

🌐 Deployment

The prediction model is deployed using Streamlit, and you can interact with it here.

🎌 Supported Regional Languages:

No.	Language	No.	Language
1.	Batak Toba	12.	Madura
2.	Bugis Kelolau	13.	Melayu Kembayan
3.	Bugis Wajo	14.	Melayu Ketapang
4.	Dayak Ahe	15.	Melayu Melawi
5.	Dayak Pesaguan	16.	Melayu Pontianak
6.	Dayak Taman	17.	Melayu Putussibau
7.	Jawa	18.	Melayu Sambas
8.	Jawa Kromo	19.	Melayu Sintang
9.	Jawa Ngoko	20.	Padang
10.	Kapuas Hulu	21.	Sunda
11.	Khek Pontianak	22.	Tio Ciu Pontianak

🔄 Research Flow

📙 Published Paper

See Published Paper Here.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Images		Images
Model		Model
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparison of Naïve Bayes and Support Vector Machine Methods in Classifying 22 Regional Languages

📁 Dataset Information

📷 Screenshot

🌐 Deployment

🎌 Supported Regional Languages:

🔄 Research Flow

📙 Published Paper

About

Releases

Packages

Languages

bimarakajati/Comparison-of-Naive-Bayes-and-Support-Vector-Machine-Methods-in-Classifying-22-Regional-Languages

Folders and files

Latest commit

History

Repository files navigation

Comparison of Naïve Bayes and Support Vector Machine Methods in Classifying 22 Regional Languages

📁 Dataset Information

📷 Screenshot

🌐 Deployment

🎌 Supported Regional Languages:

🔄 Research Flow

📙 Published Paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages