Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QA gen with accelerate #100

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Sriharsha-hatwar
Copy link
Contributor

@Sriharsha-hatwar Sriharsha-hatwar commented Oct 7, 2024

Hello @shamanez , @Jacobsolawetz

This contains the script that needs to be run to generate the QA data for e2e RAG training. I would want you to have a look at this and let me know if you have any comments.

To succesgully merge this with arcee-train, we need to launch this script using accelerate launch instead of using the API : generate_question_answer_pairs to get the dataset. This involves some more code changes in the arcee train repo as well.

Some metrics :

  1. Script with accelerate :
real    0m42.724s
user    1m21.831s
sys     2m13.471s
  1. Script without accelerate :
real    0m21.747s
user    0m27.539s
sys     0m18.316s

I believe that the gains in the QA generation would only be seen when the dataset is large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants