LID_TCD

Dataset for Ethiopian language identification and topic classification

This datset consists of 22,624 texts labled for two tasks:

- Language identification: this task is used to identify the lanaguage a give text written in.
- Topic classification: this task is also useful to classify the topics of a given text according to its content.

To run the code with Terminal use the following info.

# Load and Pre-process data
python preprocess.py

# Train
python train.py

# Test and results
python test.py

Some issues to know

The test environment is
- Python 3.5.2
- Keras 2.3.1
- tensorflow 2.1.0

=======

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Birhanu_Language-Identification_python_code		Birhanu_Language-Identification_python_code
Dataset		Dataset
__pycache__		__pycache__
data_numpy		data_numpy
Dataset.zip		Dataset.zip
README.md		README.md
Sampe_predicted_document.jpg		Sampe_predicted_document.jpg
Training_progress.jpg		Training_progress.jpg
data.txt		data.txt
label.txt		label.txt
labels.txt		labels.txt
model_summary.jpg		model_summary.jpg
model_test.hdf5		model_test.hdf5
preprocess.py		preprocess.py
sample-base.bib		sample-base.bib
test.py		test.py
text_doc.txt		text_doc.txt
train.py		train.py
train.pyc		train.pyc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LID_TCD

Dataset for Ethiopian language identification and topic classification

To run the code with Terminal use the following info.

Some issues to know

About

Releases

Packages

Languages

bdu-birhanu/LID_TCD

Folders and files

Latest commit

History

Repository files navigation

LID_TCD

Dataset for Ethiopian language identification and topic classification

To run the code with Terminal use the following info.

Some issues to know

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages