Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong guess score with multiple copies of a letter #1

Closed
gpiancastelli opened this issue May 21, 2022 · 12 comments · Fixed by #2
Closed

Wrong guess score with multiple copies of a letter #1

gpiancastelli opened this issue May 21, 2022 · 12 comments · Fixed by #2

Comments

@gpiancastelli
Copy link

There's a bug in your score_guess function. If the guess contains two copies of a letter, and that letter is present only once in the answer, and the second copy in the guess matches that letter in the answer, the first copy will be marked as WRONG_PLACE, while the second copy will be marked as NO.

I'm not an English native speaker, so I can't come up with a real example just right now. But let's say we have 'A__A_' as our guess and '___A_' as the answer (let's just ignore letters marked as _, which we presume are all different from A). Your score_guess function will return '🟨__⬜_' instead of '⬜__🟩_'.

OverkillGuy pushed a commit that referenced this issue May 21, 2022
@OverkillGuy
Copy link
Owner

OverkillGuy commented May 21, 2022

Hey, thanks for the bug report!

I grepped for answer/guess words that match this format, to feed as examples. Found this:

  • secret answer: uncut (repeating u)
  • guess: zebus (only one u, at second place)

This seems to match your definition. I've created these as a test case in a branch of the repo (not using the fancy literate programming tools), but I can't seem to reproduce your report of score 🟨__⬜_, instead I do get this: ⬜__🟩_?

Curious to investigate this further!

@OverkillGuy
Copy link
Owner

Oh no, I misunderstood your question, you have the reverse of mine (I swapped answer and guess). I'll pick words again, sorry!

@OverkillGuy
Copy link
Owner

You are right indeed! See proper commit, using:

Guess: xenon (two n)
Answer: train (one n, at the end)

Score should be ⬜⬜⬜⬜🟩, actually is ⬜⬜🟨⬜⬜!

Bug confirmed! Thank you! I'll see what I can do (any ideas?)

OverkillGuy pushed a commit that referenced this issue May 21, 2022
Get rid of the perfect scores in a first pass, to avoid bug
@OverkillGuy
Copy link
Owner

OverkillGuy commented May 21, 2022

Okay, I've got the draft of a solution which passes tests again in #2.

I don't like the look of the fix (changing response to a dict[int, str] is ugly) but this is a start. I'll revisit later.

@gpiancastelli I'd be happy to hear your feedback on this fix.

Of course, once the fix is ready on the code side, I'll have to make an update to the actual "story" that explains our change, and I'll be sure to credit this finding to you.

@gpiancastelli
Copy link
Author

gpiancastelli commented May 22, 2022

I wrote a possible solution yesterday but I need to see if I can change one or two things that I don't like. It's both similar (two passes, still uses a Counter) and different (the Counter is not used for the entire answer) from yours. Hope to post something in the next few hours.

@gpiancastelli
Copy link
Author

gpiancastelli commented May 22, 2022

So here's my take:

def score_guess(guess, answer):
    result = [CharacterScore.NO] * len(answer)

    rem_answer = Counter() # unmatched letters in answer
    rem_guess = [] # indexes of unmatched letters in guess

    # check for exact matches
    for i, (a, g) in enumerate(zip(answer, guess)):
        if a == g:
            result[i] = CharacterScore.OK
        else:
            rem_answer[a] += 1
            rem_guess.append(i)

    # check for letters in other places
    for i in rem_guess:
        letter = guess[i]
        if letter in rem_answer:
            rem_answer[letter] -= 1
            result[i] = CharacterScore.WRONG_PLACE
        if rem_answer[letter] == 0:
            del rem_answer[letter]

    return ''.join(result)

Unfortunately, I haven't been able to imagine a different algorithm that would eliminate the need for two loops. But the result is a plain list. The Counter is needed only for the letters remaining after the check for exact matches.

I have worked off-project (i.e. in a separate script file) so of course please double check that all your tests are passing. Sorry about that, but I believe I haven't touched Python seriously in the last year and a half.

@OverkillGuy
Copy link
Owner

Ooh, thanks.

As mentioned above, I wanted to make another pass to remove that dict[int, str] for response, and was idly wondering how I could use a list[str], maybe even defaulting to CharacterScore.NO, and you showcased that very elegantly!

I've run your code as-is in the project, and it does absolutely pass all tests! Separately, I'm curious if you want to try the project's structure.

Interesting approach, iterating on only the unmatched letters! It means another datastructure or two, but it gives a more specific iteration, that's cool.
I was trying to remove my uses of del in existing code, as it is an unusual part of Python I don't want to encourage, and it felt less pretty than the rest. I also may not want to go for the different datastructure, keeping it simpler, at the cost of skipping over NO case explicitly, which feels like acceptable tradeoff for less datastructures to juggle.

Thanks to your code as inspiration, I now have the following, which feels like a nice balance of short + elegant, while keeping in the spirit of the existing (if buggy) code:

def score_guess(guess: str, answer: str) -> str:
    """Score an individual guess with Counter"""
    # Counter("abbey") = Counter({'b': 2, 'a': 1, 'e': 1, 'y': 1})
    answer_chars = Counter(answer)
    # NO is the default score, no need to detect it explicitly
    response: list[str] = [CharacterScore.NO] * len(answer)
    # First pass to detect perfect scores
    for char_index, (answer_char, guess_char) in enumerate(zip(guess, answer)):
        if answer_char == guess_char:
            response[char_index] = CharacterScore.OK
            answer_chars[guess_char] -= 1
    # Second pass for the yellows
    for char_num, (guess_char, existing_score) in enumerate(zip(guess, response)):
        if existing_score == CharacterScore.OK:
            continue  # It's already green: skip
        if guess_char in answer_chars and answer_chars[guess_char] > 0:
            response[char_num] = CharacterScore.WRONG_PLACE
            # Reduce occurence counter since we "used" this occurence
            answer_chars[guess_char] -= 1
    return "".join(response)

Passes all tests, and is much nicer than my previous code. Will let it simmer for the rest of the weekend, and then I'll look into writing an addendum to the narrative, explaining the bug, the fix, and reflecting on the reason the bug wasn't caught during original development.

@gpiancastelli
Copy link
Author

gpiancastelli commented May 22, 2022

Counter is both a blessing and a curse. It does everything we need, but in a weird way which does not exactly correspond to an ideal bag/multiset. If it wasn't there, we might have defined a data structure with a "better" interface and implementation.

I like the idea of removing del, but just because it really feels unnecessary. You could have switched to calling pop, but decided to change the way you check the presence of guess_char inside answer_chars, which IMHO is more in line with how Counter works.

I still like the additional data structures I introduced, both because they are specific to the second iteration (as you said) and because in the case of the winning guess they are empty and you never enter that iteration. But maybe your version is more streamlined, I can't really say. I think a middle-ground version could be written, where the Counter is used like I did, but there is no additional list and the second loop runs on the (possibly partial) result as you did. But, oh well, to each their own I guess.

Thanks for the conversation. I will gladly read your reflections on the bug once you update the document.

@OverkillGuy
Copy link
Owner

OverkillGuy commented Jun 13, 2022

Just merged PR #2 fixing this bug, and published the associated addendum on the narrative, which I released as v1.1.0 tag. I hope you enjoy the few extra words that your report allowed me to write.

Thanks again for taking the time to report such a flagrant error, and for discussing solutions, that was very helpful.

Have a nice day,
Jb

@gpiancastelli
Copy link
Author

gpiancastelli commented Jun 15, 2022

Hi again, thanks for notifying me!

Just a further heads-up. You devoted a paragraph to the use of get in the following line:

if answer_chars.get(guess_char, 0) > 0:

to catch both the “key is not set” as well as “key is set to zero” cases in the same way.

However, answer_chars is not a simple dict; it's a Counter. And the documentation says that "Counter objects have a dictionary interface except that they return a zero count for missing items." So your line could just be written as:

if answer_chars[guess_char] > 0:

If I may, I would suggest to use Counter by its proper protocol, make the reader notice it, and enrich the discussion by citing the existence of get for "raw" dict objects.

@OverkillGuy
Copy link
Owner

Just a further heads-up. [...]

Fantastic point, that makes my life easier (I was mildly annoyed at having to make the code a tiny bit less pretty for non-experts, this fixes it).

Created as #3, will merge within the week after I let it cool down for an editing pass. Feel free to comment on that as we iterate.

I expect this to be a patch release, updating the changelog but not a whole new section.

@OverkillGuy
Copy link
Owner

OverkillGuy commented Jun 27, 2022

Merged and deployed as v1.1.1!

Thanks again for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants