Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion to Romaji fails for translated non-Japanese location names #8

Open
SFungDev opened this issue Mar 7, 2018 · 3 comments
Open

Comments

@SFungDev
Copy link

SFungDev commented Mar 7, 2018

Translation fails for some locations which have names that have been translated from another language into Japanese.

For example:
アボッツフォード空港 (Abbotsford Airport)
アンガルシー (Anglesey)
コンスタンツァ (Constanta)
エスペランス空港 (Esperance Airport)

There are many other examples of this behaviour. By turning on debugging it looks like it could be an issue with the tokenizer or token features. I have very little knowledge of Japanese or how this translation works so I can't provide much more insight.

Thanks!

@nicolas-raoul
Copy link
Owner

Thanks for your feedback!
For each one: What result are you getting, and what result did you expect?
Cheers!

@SFungDev
Copy link
Author

SFungDev commented Mar 8, 2018

Hi! In every case I've seen where translation fails, the output is equal to the input apart from some non-Katakana characters which do get translated properly (the 空港 for airport locations).
I was hoping for Romaji representations of these location names.

Here are the full debug outputs for the above examples:

./jakaroma.sh アボッツフォード空港
アボッツフォード        名詞,一般,*,*,*,*,*,*,*
空港    名詞,一般,*,*,*,*,空港,クウコウ,クーコー
Type: 一般
Type: 一般
アボッツフォード Ku-ko- 
./jakaroma.sh アンガルシー
アンガルシー    名詞,固有名詞,組織,*,*,*,*,*,*
Type: 固有名詞
アンガルシー 
./jakaroma.sh コンスタンツァ
コンスタンツァ  名詞,固有名詞,組織,*,*,*,*,*,*
Type: 固有名詞
コンスタンツァ 
./jakaroma.sh エスペランス空港 
エスペランス    名詞,一般,*,*,*,*,*,*,*
空港    名詞,一般,*,*,*,*,空港,クウコウ,クーコー
Type: 一般
Type: 一般
エスペランス Ku-ko- 

I haven't looked into performance on Hiragana much as I'm dealing exclusively with locations, which to my knowledge are usually written in either Kanji or Katakana.

Here are some examples of translations that work as I had hoped:

Abu Dhabi Airport

./jakaroma.sh アブダビ空港
アブダビ        名詞,固有名詞,地域,一般,*,*,アブダビ,アブダビ,アブダビ
空港    名詞,一般,*,*,*,*,空港,クウコウ,クーコー
Type: 固有名詞
Type: 一般
Abudabi Ku-ko- 

Mt. Fuji Shizuoka Airport

./jakaroma.sh 富士山静岡空港
富士山  名詞,固有名詞,一般,*,*,*,富士山,フジサン,フジサン
静岡    名詞,固有名詞,地域,一般,*,*,静岡,シズオカ,シズオカ
空港    名詞,一般,*,*,*,*,空港,クウコウ,クーコー
Type: 固有名詞
Type: 固有名詞
Type: 一般
Fujisan Shizuoka Ku-ko- 

Oklahoma City Airport

./jakaroma.sh オクラホマシティー空港
オクラホマ      名詞,固有名詞,地域,一般,*,*,オクラホマ,オクラホマ,オクラホマ
シティー        名詞,一般,*,*,*,*,シティー,シティー,シティー
空港    名詞,一般,*,*,*,*,空港,クウコウ,クーコー
Type: 固有名詞
Type: 一般
Type: 一般
Okurahoma Shiti- Ku-ko- 

I hope these help, I can provide more examples if needed.
Cheers!

@nicolas-raoul
Copy link
Owner

Thanks for the examples!

Very strange that アボッツフォード空港 fails when アブダビ空港 works :-o

If you understand what is the difference, what triggers the problem, please let us know, thanks! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants