Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpretation of score #11

Open
Natalie-Caruana opened this issue Apr 29, 2022 · 0 comments
Open

Interpretation of score #11

Natalie-Caruana opened this issue Apr 29, 2022 · 0 comments

Comments

@Natalie-Caruana
Copy link

Natalie-Caruana commented Apr 29, 2022

Hi, pycld2 detect function with returnVectors set to False returns four arguments. As I understand, (assuming one language detected) the confidence score of spacy-cld is calculated by dividing the third value in the third argument returned by pycld2, by 100 i.e.

reliable,textBytesFound,details,vectors=cld2.detect(text)

spacy_score = details[0][2]/100

However in pycld2's detect function documentation the third argument details is explained as follows:

details: tuple
Tuple of up to three detected languages, where each is tuple is (languageName, languageCode, percent, score). percent is what percentage of the original text was detected as this language and score is the confidence score for that language.
So if percent means the percentage of the original text detected, then this is not related to how good the prediction was. Shouldn't some form of normalization be done on the fourth argument score instead?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant