You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 21, 2023. It is now read-only.
The Cantonese (Yue Chinese, yue_Hant) data in FLORES-200 is completely wrong. The data is not Cantonese at all, but rather Mandarin Chinese in Traditional Chinese Script (zho_Hant), which only has stylistic differences compared to the zho_Hant data in the dataset.
Furthermore, the paper mentioned that the yue_Hant and zho_Hant data tend to be predicted as each other. It turns out that both datasets actually consist of zho_Hant data exclusively. yue_Hant and zho_Hant should actually be very easy to distinguish from each other.
Here is how correct yue_Hant data would look like:
Language Code
Sentence
eng_Latn
They found the Sun operated on the same basic principles as other stars: The activity of all stars in the system was found to be driven by their luminosity, their rotation, and nothing else.
I guess nobody in the FLORES team knows Cantonese and Mandarin well enough to understand the unique situation of this language. The current data collected for yue is Hong Kong Chinese, NOT Cantonese. We recommend using this classifier to filter the real Cantonese data https://github.com/CanCLID/cantonese-classifier
The Cantonese (Yue Chinese,
yue_Hant
) data in FLORES-200 is completely wrong. The data is not Cantonese at all, but rather Mandarin Chinese in Traditional Chinese Script (zho_Hant
), which only has stylistic differences compared to thezho_Hant
data in the dataset.Furthermore, the paper mentioned that the
yue_Hant
andzho_Hant
data tend to be predicted as each other. It turns out that both datasets actually consist ofzho_Hant
data exclusively.yue_Hant
andzho_Hant
should actually be very easy to distinguish from each other.Here is how correct
yue_Hant
data would look like:eng_Latn
zho_Hant
yue_Hant
(wrong)yue_Hant
(corrected)(Bold denotes words that are used exclusively in
yue_Hant
)The text was updated successfully, but these errors were encountered: