Skip to content

A code to fine-tune Wav2vec2-xls-r on non-native L2 Korean automatic pronunciation assessment (APA) as a part of 2022 NIA 1-13 research work

License

Notifications You must be signed in to change notification settings

rhss10/korean_automatic_pronunciation_assessment_nia-22-1-13

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

General

  • A source code to fine-tune self-supervised learning model (SSL) on NIA-2022-1-13 Non-native L2 Korean Dataset for Automatic Pronunciation Assessment (APA).
  • NIA-2022-1-13 Non-native L2 Korean Dataset for Automatic Pronunciation Assessment (APA) will soon be released within 2023.
  • More information regarding the usage of the dataset and docker support will be updated with the relase of dataset.

License

  • SPDX-FileCopyrightText: © 2023 Hyungshin Ryu <rhss10@snu.ac.kr>
  • SPDX-License-Identifier: Apache-2.0

Notes

  • NIA-2022-1-13 Non-native L2 Korean Dataset supports proficiency scores of 3 aspects, 'comprehensibility', 'fluency', 'accentedness'.
  • The example code is aimed at scoring 'comprehensibility'.
  • By changing the data/preprocess_data.py code, you may asess 'fluency' or 'accentedness' scores.

Commands

Prepare Data

# Data processing should be done with the ACTUAL data path
python preprocess_data.py
# create Huggingface-based datasets arrows.
python create_datasets.py

Train

# Example command for training. For more supported arguments, please refer to train.py
python train.py --exp_prefix NIA

Test

# Example
python test.py

About

A code to fine-tune Wav2vec2-xls-r on non-native L2 Korean automatic pronunciation assessment (APA) as a part of 2022 NIA 1-13 research work

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages