GitHub - mfhan/translations: All the Feelings in Les Miz: Sentiment analysis on literary translations

A sentiment-Al Journey: Sentiment Analysis Models and Literary Translation

[Svelte visuaization:] https://svelte.dev/repl/161b41989f334a12a4c2d8383831df0d?version=4.1.2

What I wanted to find out:

I've long wondered about the quality of translations, especially when it comes to fiction. Raised in a multilingual household and having dabbled in comparative litterature at university, I always yearned to find an effective way of measuring the emotional impact of a text. After three whole hours of lecture on language models -- clearly plenty of time to understand everything about sentiment analysis -- I decided to experiment a bit.

Summary of the data collection process, with links

-- Excerpts from Les Miserables, In Search of Lost Time and Wuthering Heights and their translations obtained from Project Gutenberg.

Overview of the data analysis process

-- Text preparation with Python and pandas -- Manual fine-tuning of sentence segments to ensure similarity between the two versions. -- Sentiment analysis performed with the twitter-XLM-roBERTa-base multilingual model, trained on 198 million tweets. -- Data visualization performed with Datawrapper, then Flourish -- Secondary data visualization experiment conducted with D3/Svelte.

New skills & learning opportunities:

-- Everything took a lot more time than planned. Selecting the texts took several tried as I wanted to work on texts featuring three-dimensional characters and depicting complex emotions. -- Sending everything into the roBERTa model also took a while. I had to re-cut text segments on several occasions. -- Once I collected the outcomes of the sentiment anqalyses, I looked for appropriate visualization options. -- Despite my reluctance at working with Flourish after prior misadventures, I decided to give it another try and was pleasantly surprised by the flexibility of the platform.

What I really found:

-- I was surprised by some inexplicable labeling on non-ambiguous segments. Why does the segment "with a desperate effort" rate positive with a 0.60 score? Why is the phrase "the forgotten strains of happiness" labeled negative in English and positive in French? -- I was surprised at the relative naivete of the model, which gave a positive label to the English sentence: "This grand little soul had taken its flight," a clear reference to death. The French sentence was correcly labeled negative with a 0.69 score. -- My initial hunch -- that translations, by nature may tend to pick a safe middle ground and therefore be labeled as more neutral than the original, was not completely supported. -- Another possible avenue, that some language are more polarizing than others, feels very promising.

What I tried to do but did not have the skills/time (but might do if I had more time)

-- As always, I wish I had had more time to broaden my area of inquiry. I would like to compare translations in other language pairs, I would also, of course, experiment with other language models -- although I remain surprised at how difficult it's been to find reliable multilingual

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
viz		viz
.gitignore		.gitignore
README.md		README.md
index.html		index.html
lesmiz.ipynb		lesmiz.ipynb
miz_en.json		miz_en.json
miz_en_indexed.json		miz_en_indexed.json
miz_fr.json		miz_fr.json
miz_fr_indexed.json		miz_fr_indexed.json
proust.ipynb		proust.ipynb
style.css		style.css
vinteuil.ipynb		vinteuil.ipynb
wuthering.ipynb		wuthering.ipynb
wuthering2.ipynb		wuthering2.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A sentiment-Al Journey: Sentiment Analysis Models and Literary Translation

What I wanted to find out:

Summary of the data collection process, with links

Overview of the data analysis process

New skills & learning opportunities:

What I really found:

What I tried to do but did not have the skills/time (but might do if I had more time)

Main notebook, included in this repo: lesmiz.ipynb, proust.ipynb, wuthering2.ipynb

Link to published page: https://mfhan.github.io/translations/

About

Releases

Packages

Languages

mfhan/translations

Folders and files

Latest commit

History

Repository files navigation

A sentiment-Al Journey: Sentiment Analysis Models and Literary Translation

What I wanted to find out:

Summary of the data collection process, with links

Overview of the data analysis process

New skills & learning opportunities:

What I really found:

What I tried to do but did not have the skills/time (but might do if I had more time)

Main notebook, included in this repo: lesmiz.ipynb, proust.ipynb, wuthering2.ipynb

Link to published page: https://mfhan.github.io/translations/

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages