Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added best practice guide for QNA #3

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jjasghar
Copy link
Member

@jjasghar jjasghar commented Sep 9, 2024

Best practices guide from the OCI and Lisa

Best practices guide from the OCI and Lisa

Signed-off-by: JJ Asghar <awesome@ibm.com>
@jjasghar jjasghar force-pushed the jjasghar/qna_bestpractices branch from e4edff5 to d7bfe8a Compare September 9, 2024 21:04
@joesepi
Copy link
Member

joesepi commented Sep 23, 2024

It would be good if this was formatted better. I think someone volunteered to format it better but I dont remember who. :)


- Things to Avoid
- Historically, LLM is bad in math
- Do not provide complex math calculation in Q&A seeds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this would look better as a single line because it references the same subject


- Context
- What if knowledge is based on documents not existing in the base model?
- In the qna.yaml file, you can pass context within a chunk of information (text from the document that Q&A are based on). Adding context to the skill QnA file might generate better-quality data.
Copy link
Contributor

@kelbrown20 kelbrown20 Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the formatting. I was thinking instead of a second bullet point, just leave it as a paragraph.
For example

- What if knowledge is based on documents not existing in the base model?

  In the qna.yaml file, you can pass context within a chunk of information (text from the document 
  that Q&A are based on). Adding context to the skill QnA file might generate better-quality data.

And follow that pattern for the other as well. WDYT?

- How to check the quality of the data in a large data set of the qna.yaml file?
- You don’t have to check out synthetic data generated by the SDG process. After generating synthetic data internally, the IBM Research team is sampling to check quality (no need to check them all, especially for extensive set).

- Quality
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how this section is formatted!

- What if knowledge is based on documents not existing in the base model?
- In the qna.yaml file, you can pass context within a chunk of information (text from the document that Q&A are based on). Adding context to the skill QnA file might generate better-quality data.

- Formatting & Front-End specific and may change
Copy link
Contributor

@kelbrown20 kelbrown20 Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, something like

- How to format data in the Q&A file especially how to format tables?

  Currently, only files in Markdown format are supported.
     - If the files are in any other format, they must be converted to Markdown format
     - For automatic converters, we recommend experimenting with other Markdown conversions like ‘markdown_strict’, ‘asciidoc’ and ‘gfm’

- The number of seed examples
- How many seeds I should provide?
- The number of seeds:
- Generating ~300 QnA pairs from ~5 seed examples is recommended by InstructLab product team.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the QnA, QNA and qna should be switched to Q&A to be more consistent. Right know Im between using Q&A or QnA, but I do think we should be consistent. What do folks think?

Copy link
Contributor

@kelbrown20 kelbrown20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjasghar I had some ideas for the possible formats, but Id def like to know your thoughts as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants