Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same token can be matched multiple times in stringMatchesReference (for evaluating answers) #132

Closed
alopezlago opened this issue Feb 19, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@alopezlago
Copy link
Collaborator

website/server/scorer.js

Lines 428 to 450 in 12f2e68

// check if every token in the string is in the reference
for (let i = 0; i < stringTokens.length; i++) {
let tokenMatches = false;
for (let j = 0; j < referenceTokens.length; j++) {
let errors;
if (useStemmer) {
errors = distance(stemmer(stringTokens[i]), stemmer(referenceTokens[j]));
} else {
errors = distance(stringTokens[i], referenceTokens[j]);
}
if (strictness * errors <= referenceTokens[j].length || (acceptSubstring && referenceTokens[j].includes(stringTokens[i]))) {
tokenMatches = true;
break;
}
}
if (!tokenMatches) {
return false;
}
}

When comparing stringTokens against referenceTokens in Room.stringMatchesReference, we never remove the reference token that matches the string token. This means that the same reference token can match different string tokens. This will lead to inocrrect matches. For example, there will be a match when the reference string "Major Tom" is compared against the string "Major Major Major", even though they are different.

There are a few ways to do this. One way to do it is to keep track of the last reference token we should use, and when we find a matching reference token, we swap it with the token at the last position and decrease the number of reference tokens we check against.

@geoffrey-wu geoffrey-wu added the bug Something isn't working label Mar 20, 2023
@geoffrey-wu
Copy link
Member

In a similar vein, token order should (probably) matter - i.e. "potential chemical" shouldn't match against "chemical potential". Probably easiest to keep an index for both arrays and increment them (never going backwards) to check for a match.

@geoffrey-wu
Copy link
Member

For now, I'm going to assume that token order does not matter; otherwise, answers like "Australian Prime Minister" don't get accepted for "Prime Minister of Australia", for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants