Interpretation of score #11

Natalie-Caruana · 2022-04-29T15:40:13Z

Hi, pycld2 detect function with returnVectors set to False returns four arguments. As I understand, (assuming one language detected) the confidence score of spacy-cld is calculated by dividing the third value in the third argument returned by pycld2, by 100 i.e.

reliable,textBytesFound,details,vectors=cld2.detect(text)

spacy_score = details[0][2]/100

However in pycld2's detect function documentation the third argument details is explained as follows:

details: tuple
Tuple of up to three detected languages, where each is tuple is (languageName, languageCode, percent, score). percent is what percentage of the original text was detected as this language and score is the confidence score for that language.
So if percent means the percentage of the original text detected, then this is not related to how good the prediction was. Shouldn't some form of normalization be done on the fourth argument score instead?

Thanks

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpretation of score #11

Interpretation of score #11

Natalie-Caruana commented Apr 29, 2022 •

edited

Loading

Interpretation of score #11

Interpretation of score #11

Comments

Natalie-Caruana commented Apr 29, 2022 • edited Loading

Natalie-Caruana commented Apr 29, 2022 •

edited

Loading