ldig (Language Detection with Infinity Gram)

This is a prototype of language detection for short message service (twitter). with 99.1% accuracy for 17 languages

Usage

Extract model directory tar xf models/[select model archive]
Detect ldig.py -m [model directory] [text data file]

Data format

As input data, Each tweet is one line in text file as the below format.

[label]\t[some metadata separated '\t']\t[text without '\t']

[label] is a language name alike en, de, fr and so on. It is also optional as metadata. (ldig doesn't use metadata and label for detection, of course :D)

The output data of lidg is as the below.

[correct label]\t[detected label]\t[original metadata and text]

Estimation Tool

ldig has a estimation tool.

./server.py -m [model directory]

Open http://localhost:48000 and input target text into textarea. Then ldig outputs language probabilities and feature parameters in the text.

Supported Languages

cs Czech
da Dannish
de German
en English
es Spanish
fi Finnish
fr French
id Indonesian
it Italian
nl Dutch
no Norwegian
pl Polish
pt Portuguese
ro Romanian
sv Swedish
tr Turkish
vi Vietnamese

Documents

Copyright & License

All codes and resources are available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
maxsubst		maxsubst
models		models
static		static
da.py		da.py
ldig.py		ldig.py
readme.md		readme.md
server.py		server.py
test_da.py		test_da.py
testcase.py		testcase.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ldig (Language Detection with Infinity Gram)

Usage

Data format

Estimation Tool

Supported Languages

Documents

Copyright & License

About

Releases

Packages

Contributors 2

Languages

shuyo/ldig

Folders and files

Latest commit

History

Repository files navigation

ldig (Language Detection with Infinity Gram)

Usage

Data format

Estimation Tool

Supported Languages

Documents

Copyright & License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages