-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add red teaming orchestrators to replace RedTeamingBot #84
Add red teaming orchestrators to replace RedTeamingBot #84
Conversation
…redteamingbot_orchestrator
…ng and end-token support.
…redteamingbot_orchestrator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @romanlutz! Love the architecture and separation. Really good design
Only thing missing is tests, but everything looks good to me
Thanks! I've been looking into tests the last couple of days. There are a few things that we still need to settle on, including
|
I would say
Good point. We may want to add a constraint on certain types of converters so they can only do one. But let's design it.
I think this is okay. If we run across use cases where folks want to reuse it, we can add that functionality, but I'd expect it to be manual e.g. |
Will need to address these at a later point and will follow up with Gary. Would prefer to unblock the team by merging this tonight if possible.
Description
This PR adds the
BaseRedTeamingOrchestrator
abstract class as well as two implementation classesScoringRedTeamingOrchestrator
andEndTokenRedTeamingOrchestrator
. With these changes, a few others were necessary and are therefore part of this PR:AttackStrategy
that encapsulates the functionality previously spread over 3 input args ofRedTeamingBot
(attack_strategy
,attack_strategy_kwargs
, andconversation_objective
) and allows for just string inputs as strategies, too.RedTeamingBot
needed to be updated and all the logic replaced. This includes several notebooks and documentation files.GandalfTarget
which does not support async.What is not included here?
Tests
Documentation