Wrong guess score with multiple copies of a letter #1

gpiancastelli · 2022-05-21T22:50:10Z

There's a bug in your score_guess function. If the guess contains two copies of a letter, and that letter is present only once in the answer, and the second copy in the guess matches that letter in the answer, the first copy will be marked as WRONG_PLACE, while the second copy will be marked as NO.

I'm not an English native speaker, so I can't come up with a real example just right now. But let's say we have 'A__A_' as our guess and '___A_' as the answer (let's just ignore letters marked as _, which we presume are all different from A). Your score_guess function will return '🟨__⬜_' instead of '⬜__🟩_'.

The text was updated successfully, but these errors were encountered:

OverkillGuy · 2022-05-21T23:10:22Z

Hey, thanks for the bug report!

I grepped for answer/guess words that match this format, to feed as examples. Found this:

secret answer: uncut (repeating u)
guess: zebus (only one u, at second place)

This seems to match your definition. I've created these as a test case in a branch of the repo (not using the fancy literate programming tools), but I can't seem to reproduce your report of score 🟨__⬜_, instead I do get this: ⬜__🟩_?

Curious to investigate this further!

OverkillGuy · 2022-05-21T23:14:10Z

Oh no, I misunderstood your question, you have the reverse of mine (I swapped answer and guess). I'll pick words again, sorry!

OverkillGuy · 2022-05-21T23:22:18Z

You are right indeed! See proper commit, using:

Guess: xenon (two n)
Answer: train (one n, at the end)

Score should be ⬜⬜⬜⬜🟩, actually is ⬜⬜🟨⬜⬜!

Bug confirmed! Thank you! I'll see what I can do (any ideas?)

Get rid of the perfect scores in a first pass, to avoid bug

OverkillGuy · 2022-05-21T23:54:00Z

Okay, I've got the draft of a solution which passes tests again in #2.

I don't like the look of the fix (changing response to a dict[int, str] is ugly) but this is a start. I'll revisit later.

@gpiancastelli I'd be happy to hear your feedback on this fix.

Of course, once the fix is ready on the code side, I'll have to make an update to the actual "story" that explains our change, and I'll be sure to credit this finding to you.

gpiancastelli · 2022-05-22T10:58:15Z

I wrote a possible solution yesterday but I need to see if I can change one or two things that I don't like. It's both similar (two passes, still uses a Counter) and different (the Counter is not used for the entire answer) from yours. Hope to post something in the next few hours.

gpiancastelli · 2022-05-22T12:22:24Z

So here's my take:

def score_guess(guess, answer):
    result = [CharacterScore.NO] * len(answer)

    rem_answer = Counter() # unmatched letters in answer
    rem_guess = [] # indexes of unmatched letters in guess

    # check for exact matches
    for i, (a, g) in enumerate(zip(answer, guess)):
        if a == g:
            result[i] = CharacterScore.OK
        else:
            rem_answer[a] += 1
            rem_guess.append(i)

    # check for letters in other places
    for i in rem_guess:
        letter = guess[i]
        if letter in rem_answer:
            rem_answer[letter] -= 1
            result[i] = CharacterScore.WRONG_PLACE
        if rem_answer[letter] == 0:
            del rem_answer[letter]

    return ''.join(result)

Unfortunately, I haven't been able to imagine a different algorithm that would eliminate the need for two loops. But the result is a plain list. The Counter is needed only for the letters remaining after the check for exact matches.

I have worked off-project (i.e. in a separate script file) so of course please double check that all your tests are passing. Sorry about that, but I believe I haven't touched Python seriously in the last year and a half.

OverkillGuy · 2022-05-22T13:48:13Z

Ooh, thanks.

As mentioned above, I wanted to make another pass to remove that dict[int, str] for response, and was idly wondering how I could use a list[str], maybe even defaulting to CharacterScore.NO, and you showcased that very elegantly!

I've run your code as-is in the project, and it does absolutely pass all tests! Separately, I'm curious if you want to try the project's structure.

Interesting approach, iterating on only the unmatched letters! It means another datastructure or two, but it gives a more specific iteration, that's cool.
I was trying to remove my uses of del in existing code, as it is an unusual part of Python I don't want to encourage, and it felt less pretty than the rest. I also may not want to go for the different datastructure, keeping it simpler, at the cost of skipping over NO case explicitly, which feels like acceptable tradeoff for less datastructures to juggle.

Thanks to your code as inspiration, I now have the following, which feels like a nice balance of short + elegant, while keeping in the spirit of the existing (if buggy) code:

def score_guess(guess: str, answer: str) -> str:
    """Score an individual guess with Counter"""
    # Counter("abbey") = Counter({'b': 2, 'a': 1, 'e': 1, 'y': 1})
    answer_chars = Counter(answer)
    # NO is the default score, no need to detect it explicitly
    response: list[str] = [CharacterScore.NO] * len(answer)
    # First pass to detect perfect scores
    for char_index, (answer_char, guess_char) in enumerate(zip(guess, answer)):
        if answer_char == guess_char:
            response[char_index] = CharacterScore.OK
            answer_chars[guess_char] -= 1
    # Second pass for the yellows
    for char_num, (guess_char, existing_score) in enumerate(zip(guess, response)):
        if existing_score == CharacterScore.OK:
            continue  # It's already green: skip
        if guess_char in answer_chars and answer_chars[guess_char] > 0:
            response[char_num] = CharacterScore.WRONG_PLACE
            # Reduce occurence counter since we "used" this occurence
            answer_chars[guess_char] -= 1
    return "".join(response)

Passes all tests, and is much nicer than my previous code. Will let it simmer for the rest of the weekend, and then I'll look into writing an addendum to the narrative, explaining the bug, the fix, and reflecting on the reason the bug wasn't caught during original development.

gpiancastelli · 2022-05-22T15:28:25Z

Counter is both a blessing and a curse. It does everything we need, but in a weird way which does not exactly correspond to an ideal bag/multiset. If it wasn't there, we might have defined a data structure with a "better" interface and implementation.

I like the idea of removing del, but just because it really feels unnecessary. You could have switched to calling pop, but decided to change the way you check the presence of guess_char inside answer_chars, which IMHO is more in line with how Counter works.

I still like the additional data structures I introduced, both because they are specific to the second iteration (as you said) and because in the case of the winning guess they are empty and you never enter that iteration. But maybe your version is more streamlined, I can't really say. I think a middle-ground version could be written, where the Counter is used like I did, but there is no additional list and the second loop runs on the (possibly partial) result as you did. But, oh well, to each their own I guess.

Thanks for the conversation. I will gladly read your reflections on the bug once you update the document.

OverkillGuy · 2022-06-13T00:58:40Z

Just merged PR #2 fixing this bug, and published the associated addendum on the narrative, which I released as v1.1.0 tag. I hope you enjoy the few extra words that your report allowed me to write.

Thanks again for taking the time to report such a flagrant error, and for discussing solutions, that was very helpful.

Have a nice day,
Jb

gpiancastelli · 2022-06-15T21:41:50Z

Hi again, thanks for notifying me!

Just a further heads-up. You devoted a paragraph to the use of get in the following line:

if answer_chars.get(guess_char, 0) > 0:

to catch both the “key is not set” as well as “key is set to zero” cases in the same way.

However, answer_chars is not a simple dict; it's a Counter. And the documentation says that "Counter objects have a dictionary interface except that they return a zero count for missing items." So your line could just be written as:

if answer_chars[guess_char] > 0:

If I may, I would suggest to use Counter by its proper protocol, make the reader notice it, and enrich the discussion by citing the existence of get for "raw" dict objects.

OverkillGuy · 2022-06-18T12:45:48Z

Just a further heads-up. [...]

Fantastic point, that makes my life easier (I was mildly annoyed at having to make the code a tiny bit less pretty for non-experts, this fixes it).

Created as #3, will merge within the week after I let it cool down for an editing pass. Feel free to comment on that as we iterate.

I expect this to be a patch release, updating the changelog but not a whole new section.

OverkillGuy · 2022-06-27T00:17:17Z

Merged and deployed as v1.1.1!

Thanks again for your help.

OverkillGuy pushed a commit that referenced this issue May 21, 2022

Add example of issue #1

f871057

OverkillGuy pushed a commit that referenced this issue May 21, 2022

Fix bug #1 via first pass for perfects

6f11c4c

Get rid of the perfect scores in a first pass, to avoid bug

OverkillGuy mentioned this issue May 21, 2022

Fix bad scoring for multiple same char in guess #2

Merged

OverkillGuy closed this as completed in #2 Jun 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong guess score with multiple copies of a letter #1

Wrong guess score with multiple copies of a letter #1

gpiancastelli commented May 21, 2022

OverkillGuy commented May 21, 2022 •

edited

Loading

OverkillGuy commented May 21, 2022

OverkillGuy commented May 21, 2022

OverkillGuy commented May 21, 2022 •

edited

Loading

gpiancastelli commented May 22, 2022 •

edited

Loading

gpiancastelli commented May 22, 2022 •

edited

Loading

OverkillGuy commented May 22, 2022

gpiancastelli commented May 22, 2022 •

edited

Loading

OverkillGuy commented Jun 13, 2022 •

edited

Loading

gpiancastelli commented Jun 15, 2022 •

edited

Loading

OverkillGuy commented Jun 18, 2022

OverkillGuy commented Jun 27, 2022 •

edited

Loading

Wrong guess score with multiple copies of a letter #1

Wrong guess score with multiple copies of a letter #1

Comments

gpiancastelli commented May 21, 2022

OverkillGuy commented May 21, 2022 • edited Loading

OverkillGuy commented May 21, 2022

OverkillGuy commented May 21, 2022

OverkillGuy commented May 21, 2022 • edited Loading

gpiancastelli commented May 22, 2022 • edited Loading

gpiancastelli commented May 22, 2022 • edited Loading

OverkillGuy commented May 22, 2022

gpiancastelli commented May 22, 2022 • edited Loading

OverkillGuy commented Jun 13, 2022 • edited Loading

gpiancastelli commented Jun 15, 2022 • edited Loading

OverkillGuy commented Jun 18, 2022

OverkillGuy commented Jun 27, 2022 • edited Loading

OverkillGuy commented May 21, 2022 •

edited

Loading

OverkillGuy commented May 21, 2022 •

edited

Loading

gpiancastelli commented May 22, 2022 •

edited

Loading

gpiancastelli commented May 22, 2022 •

edited

Loading

gpiancastelli commented May 22, 2022 •

edited

Loading

OverkillGuy commented Jun 13, 2022 •

edited

Loading

gpiancastelli commented Jun 15, 2022 •

edited

Loading

OverkillGuy commented Jun 27, 2022 •

edited

Loading