Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skill: Add reasoning skill about siblings #736

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

russellb
Copy link
Member

This skill was inspired by one of the tests included in mt-bench. It included some questions that the model answered incorrectly. When given the same question manually, I got the following incorrect answer:

Q: David has three sisters. Each of them has one brother. How many
brothers does David have?
A: David has three sisters, and each of them has one brother, so it
must be the same brother in all cases. Therefore, David has 1 brother.

@russellb russellb requested a review from a team as a code owner April 21, 2024 23:32
@github-actions github-actions bot added triage-needed (Auto labeled) skill is ready to be triaged skill (Auto labeled) labels Apr 21, 2024
Copy link

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

  • @instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
  • @instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
  • @instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
  • @instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

@russellb
Copy link
Member Author

Originally posted as #365

@russellb
Copy link
Member Author

@instructlab-bot precheck

Copy link

Beep, boop 🤖, Generating test data for your PR with the job type: precheck. Your Job ID is 180. The results will be presented below in the pull request status box. This may take several minutes...

@russellb
Copy link
Member Author

@instructlab-bot generate

Copy link

Beep, boop 🤖, Generating test data for your PR with the job type: sdg-svc. Your Job ID is 181. The results will be presented below in the pull request status box. This may take several minutes...

Copy link

Results for job ID: 180 using the model merlinite-7b!

Results can be found here.

Copy link

Results for job ID: 181 using the model sdg service backend!

Results can be found here.

Copy link
Member

@mingxzhao mingxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on SDG, seems model will have a bit of trouble coming up with good synthetic data. One problem that I can see beyond logical flaws is the gendering of the names. We assume David is male but model may not make same assumption, thus the question of # of brothers is more ambiguous.

@mingxzhao mingxzhao added triage-requested-changes skill has been reviewed; changes requested from contributor triage-uncertain triager is uncertain which can be for a variety of reasons and removed triage-needed (Auto labeled) skill is ready to be triaged labels Apr 23, 2024
@russellb
Copy link
Member Author

Based on SDG, seems model will have a bit of trouble coming up with good synthetic data. One problem that I can see beyond logical flaws is the gendering of the names. We assume David is male but model may not make same assumption, thus the question of # of brothers is more ambiguous.

good feedback, thanks. I'll look at trying to improve this at some point, or I'm happy to take suggested improvements from anyone interested.

For reference, this was based on something the model didn't do well with mt-bench.

https://github.com/lm-sys/FastChat/blob/f22f2194c9152a25d2987e5118206e3bbb9efd5e/fastchat/llm_judge/data/mt_bench/question.jsonl#L24

@mingxzhao
Copy link
Member

Ah I see, then I may just have to go argue with the mt-bench mods. Maybe try prefacing the question and confirming David is a male. I have doubts if this will allow the model to solve but it would at least tell us if the assumption of David's gender was the issue with the model reasoning.

@github-actions github-actions bot added the triage-needed (Auto labeled) skill is ready to be triaged label Apr 23, 2024
Copy link

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

  • @instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
  • @instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
  • @instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
  • @instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

@instructlab instructlab deleted a comment from RobotSail Apr 24, 2024
@mingxzhao mingxzhao removed triage-needed (Auto labeled) skill is ready to be triaged triage-uncertain triager is uncertain which can be for a variety of reasons labels May 7, 2024
@jjasghar
Copy link
Member

jjasghar commented Jun 7, 2024

@instructlab-bot precheck

Copy link

Beep, boop 🤖, Generating test data for your PR with the job type: precheck. Your Job ID is 377. The results will be presented below in the pull request status box. This may take several minutes...

Copy link

Results for job ID: 377 using the model instructlab/granite-7b-lab!

Results can be found here.

@github-actions github-actions bot added the triage-needed (Auto labeled) skill is ready to be triaged label Aug 27, 2024
@bjhargrave
Copy link
Contributor

I relocated the folder to the new taxonomy organization. I also updated the seed examples to be more specific about whether the subject is a brother or sister instead of making assumption from the name.

@bjhargrave bjhargrave added community-build-ready Triage Team has signed off for synthetic data generation and removed triage-needed (Auto labeled) skill is ready to be triaged triage-requested-changes skill has been reviewed; changes requested from contributor labels Aug 28, 2024
@mcorbin-ibm
Copy link
Contributor

@bjhargrave

I relocated the folder to the new taxonomy organization (compositional_skills/philosophy/logic/induction/siblings/)

I think I would change "induction" to "inductive_reasoning" because induction has too many other connotations and inductive_reasoning makes it more apparent.

This skill was inspired by one of the tests included in mt-bench. It
included some questions that the model answered incorrectly. When
given the same question manually, I got the following incorrect answer:

> Q: David has three sisters. Each of them has one brother. How many
>    brothers does David have?
> A: David has three sisters, and each of them has one brother, so it
>    must be the same brother in all cases. Therefore, David has 1 brother.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>
@github-actions github-actions bot added the triage-needed (Auto labeled) skill is ready to be triaged label Sep 24, 2024
Copy link

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

  • @instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
  • @instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
  • @instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
  • @instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers instructlab-bot-triagers instructlab-bot-maintainers oversight-committee]] teams are allowed to run these commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-build-ready Triage Team has signed off for synthetic data generation skill (Auto labeled) triage-needed (Auto labeled) skill is ready to be triaged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants