This repository is the place where the data from the Spoken corpus of the dialects of Khakas is curated. This repository also provides an alternative way to access corpus data locally. The data is stored in data_oral_khakas_corpus.csv
with 85107 rows and 14 columns:
filename
time_start
time_end
speaker
recorded
sentence_id
text
translation
word_forms
morphonology
gloss
language
dataset_creator
dataset_provider
The Spoken corpus of the dialects of Khakas contains transcribed annotated texts, synchronized with the sound. The texts were recorded during the 21st century with speakers born in 1916-1985 in different expeditions from Moscow to the Republic of Khakassia. All texts are translated to Russian. Texts were analyzed using the automatic parser, and then edited and synchronized with the sound with the help of the ELAN software.
If you use data from the Spoken corpus of the dialects of Khakas in your research, please cite as follows:
Vera Maltseva, Elena Sokur. Spoken corpus of the dialects of Khakas. Moscow: Institute of Linguistics; Moscow: Linguistic Convergence Laboratory, NRU HSE. (Available online at http://lingconlab.ru/spoken_khakas/, accessed on ....)
You may contact with questions about the Corpus data or leave an issue in this repository:
malt.wh@gmail.com (Vera Maltseva)
You may contact with questions about the search platform or leave an issue in its own repository:
elena.o.sokur@gmail.com (Elena Sokur)