skill: Add reasoning skill about siblings #736

russellb · 2024-04-21T23:32:45Z

This skill was inspired by one of the tests included in mt-bench. It included some questions that the model answered incorrectly. When given the same question manually, I got the following incorrect answer:

Q: David has three sisters. Each of them has one brother. How many
brothers does David have?
A: David has three sisters, and each of them has one brother, so it
must be the same brother in all cases. Therefore, David has 1 brother.

instruct-lab-bot · 2024-04-21T23:32:57Z

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

@instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
@instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
@instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
@instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

russellb · 2024-04-21T23:33:10Z

Originally posted as #365

russellb · 2024-04-21T23:33:22Z

@instructlab-bot precheck

instruct-lab-bot · 2024-04-21T23:33:25Z

Beep, boop 🤖, Generating test data for your PR with the job type: precheck. Your Job ID is 180. The results will be presented below in the pull request status box. This may take several minutes...

russellb · 2024-04-21T23:33:28Z

@instructlab-bot generate

instruct-lab-bot · 2024-04-21T23:33:30Z

Beep, boop 🤖, Generating test data for your PR with the job type: sdg-svc. Your Job ID is 181. The results will be presented below in the pull request status box. This may take several minutes...

instruct-lab-bot · 2024-04-21T23:34:00Z

Results for job ID: 180 using the model merlinite-7b!

Results can be found here.

instruct-lab-bot · 2024-04-21T23:36:41Z

Results for job ID: 181 using the model sdg service backend!

Results can be found here.

mingxzhao

Based on SDG, seems model will have a bit of trouble coming up with good synthetic data. One problem that I can see beyond logical flaws is the gendering of the names. We assume David is male but model may not make same assumption, thus the question of # of brothers is more ambiguous.

russellb · 2024-04-23T04:49:51Z

Based on SDG, seems model will have a bit of trouble coming up with good synthetic data. One problem that I can see beyond logical flaws is the gendering of the names. We assume David is male but model may not make same assumption, thus the question of # of brothers is more ambiguous.

good feedback, thanks. I'll look at trying to improve this at some point, or I'm happy to take suggested improvements from anyone interested.

For reference, this was based on something the model didn't do well with mt-bench.

https://github.com/lm-sys/FastChat/blob/f22f2194c9152a25d2987e5118206e3bbb9efd5e/fastchat/llm_judge/data/mt_bench/question.jsonl#L24

mingxzhao · 2024-04-23T04:59:55Z

Ah I see, then I may just have to go argue with the mt-bench mods. Maybe try prefacing the question and confirming David is a male. I have doubts if this will allow the model to solve but it would at least tell us if the assumption of David's gender was the issue with the model reasoning.

instruct-lab-bot · 2024-04-23T21:59:47Z

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

@instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
@instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
@instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
@instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

jjasghar · 2024-06-07T17:19:19Z

@instructlab-bot precheck

instruct-lab-bot · 2024-06-07T17:19:21Z

Beep, boop 🤖, Generating test data for your PR with the job type: precheck. Your Job ID is 377. The results will be presented below in the pull request status box. This may take several minutes...

instruct-lab-bot · 2024-06-07T17:19:46Z

Results for job ID: 377 using the model instructlab/granite-7b-lab!

Results can be found here.

bjhargrave · 2024-08-28T18:45:40Z

I relocated the folder to the new taxonomy organization. I also updated the seed examples to be more specific about whether the subject is a brother or sister instead of making assumption from the name.

mcorbin-ibm · 2024-09-23T18:47:19Z

@bjhargrave

I relocated the folder to the new taxonomy organization (compositional_skills/philosophy/logic/induction/siblings/)

I think I would change "induction" to "inductive_reasoning" because induction has too many other connotations and inductive_reasoning makes it more apparent.

This skill was inspired by one of the tests included in mt-bench. It included some questions that the model answered incorrectly. When given the same question manually, I got the following incorrect answer: > Q: David has three sisters. Each of them has one brother. How many > brothers does David have? > A: David has three sisters, and each of them has one brother, so it > must be the same brother in all cases. Therefore, David has 1 brother. Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>

instruct-lab-bot · 2024-09-24T21:27:42Z

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

@instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
@instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
@instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
@instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers instructlab-bot-triagers instructlab-bot-maintainers oversight-committee]] teams are allowed to run these commands.

russellb requested a review from a team as a code owner April 21, 2024 23:32

github-actions bot added triage-needed (Auto labeled) skill is ready to be triaged skill (Auto labeled) labels Apr 21, 2024

mingxzhao requested changes Apr 23, 2024

View reviewed changes

mingxzhao added triage-requested-changes skill has been reviewed; changes requested from contributor triage-uncertain triager is uncertain which can be for a variety of reasons and removed triage-needed (Auto labeled) skill is ready to be triaged labels Apr 23, 2024

RobotSail force-pushed the main branch from 825fde4 to 9ae7bfa Compare April 23, 2024 17:40

bjhargrave force-pushed the reasoning-siblings branch from ed04050 to 9157839 Compare April 23, 2024 21:59

github-actions bot added the triage-needed (Auto labeled) skill is ready to be triaged label Apr 23, 2024

instructlab deleted a comment from RobotSail Apr 24, 2024

mingxzhao removed triage-needed (Auto labeled) skill is ready to be triaged triage-uncertain triager is uncertain which can be for a variety of reasons labels May 7, 2024

bjhargrave force-pushed the reasoning-siblings branch from 9157839 to 3bfb1be Compare August 27, 2024 16:46

github-actions bot added the triage-needed (Auto labeled) skill is ready to be triaged label Aug 27, 2024

bjhargrave force-pushed the reasoning-siblings branch from 3bfb1be to ad8b6ce Compare August 28, 2024 18:44

bjhargrave added community-build-ready Triage Team has signed off for synthetic data generation and removed triage-needed (Auto labeled) skill is ready to be triaged triage-requested-changes skill has been reviewed; changes requested from contributor labels Aug 28, 2024

bjhargrave approved these changes Aug 28, 2024

View reviewed changes

bjhargrave requested a review from mingxzhao August 28, 2024 18:46

bjhargrave force-pushed the reasoning-siblings branch from ad8b6ce to 6c376d0 Compare September 24, 2024 21:27

github-actions bot added the triage-needed (Auto labeled) skill is ready to be triaged label Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skill: Add reasoning skill about siblings #736

skill: Add reasoning skill about siblings #736

russellb commented Apr 21, 2024

instruct-lab-bot bot commented Apr 21, 2024

russellb commented Apr 21, 2024

russellb commented Apr 21, 2024

instruct-lab-bot bot commented Apr 21, 2024

russellb commented Apr 21, 2024

instruct-lab-bot bot commented Apr 21, 2024

instruct-lab-bot bot commented Apr 21, 2024

instruct-lab-bot bot commented Apr 21, 2024

mingxzhao left a comment

russellb commented Apr 23, 2024

mingxzhao commented Apr 23, 2024

instruct-lab-bot bot commented Apr 23, 2024

jjasghar commented Jun 7, 2024

instruct-lab-bot bot commented Jun 7, 2024

instruct-lab-bot bot commented Jun 7, 2024

bjhargrave commented Aug 28, 2024

mcorbin-ibm commented Sep 23, 2024

instruct-lab-bot bot commented Sep 24, 2024

skill: Add reasoning skill about siblings #736

Are you sure you want to change the base?

skill: Add reasoning skill about siblings #736

Conversation

russellb commented Apr 21, 2024

instruct-lab-bot bot commented Apr 21, 2024

russellb commented Apr 21, 2024

russellb commented Apr 21, 2024

instruct-lab-bot bot commented Apr 21, 2024

russellb commented Apr 21, 2024

instruct-lab-bot bot commented Apr 21, 2024

instruct-lab-bot bot commented Apr 21, 2024

instruct-lab-bot bot commented Apr 21, 2024

mingxzhao left a comment

Choose a reason for hiding this comment

russellb commented Apr 23, 2024

mingxzhao commented Apr 23, 2024

instruct-lab-bot bot commented Apr 23, 2024

jjasghar commented Jun 7, 2024

instruct-lab-bot bot commented Jun 7, 2024

instruct-lab-bot bot commented Jun 7, 2024

bjhargrave commented Aug 28, 2024

mcorbin-ibm commented Sep 23, 2024

instruct-lab-bot bot commented Sep 24, 2024