GitHub - memonji/word_translation: ML4NLP Analysis

Mapping semantic spaces for translation through PLSR esploring intelligibility effects

Author: Emma Angela Montecchiari

Course: Università di Trento - Machine Learning for NLP 2022/23

Date: July 5, 2024

Project Proposal

Abstract

This study explores the efficacy of semantic spaces in facilitating word translation tasks among closely related languages—Catalan, Italian, and English. Leveraging Partial Least Squares Regression (PLSR), the research investigates how linguistic intelligibility influences model performance. Results indicate that languages with closer lexical proximity exhibit higher translation accuracy.

main.py: User interactive script to train, test and evaluate the PLSR model for translation task. The stored outputs (nns with optimal parameters) are in in ./results_bestncomps/manual_selection/.
data_handling.py: User interactive script to (A) extract key words from the semantic spaces and store them; (B) plot 2D and 3D representation of the spaces. The stored material is in ./pairs/ and ./spaces/figures/ folders.
cosine_similarity.py: Script to compute cosine similarity distances between outputted nns and gold standard translations and store them. The stored material is in ./results_bestncomps/cosine_similarity/ with the best performing parameters outputs.

Pre-trained semantic spaces: Downloaded pre-trained Catalan, English and Italian semantic spaces. Stored in ./spaces/

Requirements

The code has been implemented over a Python 3.11.3 version and with a Conda (23.5.2) environment.

Required packages:

scikit-learn 1.2.2
numpy 1.24.3
docopt 0.6.2
matplotlib 3.7.1

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
pairs		pairs
results_bestncomps		results_bestncomps
spaces		spaces
README.md		README.md
cosine_similarity.py		cosine_similarity.py
data_exploration.py		data_exploration.py
main.py		main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mapping semantic spaces for translation through PLSR esploring intelligibility effects

Project Proposal

Abstract

Contents

Requirements

About

Releases

Packages

Languages

memonji/word_translation

Folders and files

Latest commit

History

Repository files navigation

Mapping semantic spaces for translation through PLSR esploring intelligibility effects

Project Proposal

Abstract

Contents

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages