Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Winogrande evaluation #5015

Merged
merged 4 commits into from
Jan 18, 2024
Merged

Add Winogrande evaluation #5015

merged 4 commits into from
Jan 18, 2024

Commits on Jan 17, 2024

  1. winogrande: simple implementation

    It doesn't look like it is working - why?
    For Mistral-7B it is barely better than
    random chance (score ~60% for 1267 tasks), while I see
    Mistral-7B scoring 78.4% on the HF leader board.
    1-sigma statistical uncertainty for 1267 tasks is ~1.4,
    so no way the difference is due to statistics.
    Kawrakow committed Jan 17, 2024
    Configuration menu
    Copy the full SHA
    09db8bd View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2024

  1. winogrande: somewhat better

    Score for Mistrali7-B is now 68.9 on the validation set of
    winogrande_debiased. Still far from the reported 78.4, but
    better than what I had before.
    Kawrakow committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    2605b92 View commit details
    Browse the repository at this point in the history
  2. winogrande: improving

    Mistral-7B score is now 73.56.
    Still not quite 78.4 but getting there.
    We are also getting a lower score on HellaSwag
    compared to HF leader board, so I'm not expecting
    we will get up to 78.4 anyway.
    
    It looks like it is better to skip the choice word(s)
    when evaluating the average log-likelihood. This kind of
    makes sense because a more common word (in Winogrande this is
    often a name) will have a higher probability without knowing
    about the follow up context, and this will skew the log-likelihood
    towards the more common word. We can only do this if the
    choice words are not last in the sentence.
    
    It also looks like it is better to skip the punctuation at the
    end of the sentence, provided the choice words are not last.
    Kawrakow committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    e0d4439 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e3a17dc View commit details
    Browse the repository at this point in the history