-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attack fixes #555
Attack fixes #555
Conversation
Not sure why the cohere library is causing failures. We should fix that in another branch. |
…robe to clean up generator state.
… often with ConversationalPipeline.
…s are specified alongside target_generator
…rget. Check for success in GCG attack manager.
…e attack success detection.
…for conversational models; Raise and log a more useful error in clean_attacks_and_convs
…mprove checking for conversion to OpenAI API messages.
0d9e96c
to
9b975b4
Compare
Rebased and force-pushed to fix Cohere error. |
…ators without instantiating twice.
… probe. Add error handling to generative GCG, TAP, PAIR, and AutoDAN probes.
… target models in TAP to work better with VRAM. Modify GCG, TAP, and AutoDAN probes to return an empty list to avoid assertion errors. Fix AutoDAN mutation generator.
@@ -1429,7 +1431,7 @@ def run( | |||
test_steps: int = 50, | |||
incr_control: bool = True, | |||
stop_on_success: bool = True, | |||
verbose: bool = True, | |||
verbose: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be taken from garak._config.system.verbose
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can! That's a good one.
target_max_tokens: int = 150, | ||
evaluator_max_tokens: int = 10, | ||
evaluator_temperature: float = 0.0, | ||
self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey do you know how these params can be set when invoking via the probe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we need a ProbeConfig
class in _config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
values in _config.plugins.probes.ClassName
are automatically set in classes at instantiation time - just wondering a bit about the stuff in resources/
. maybe let's wait for demand before triaging dealing with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a lot of work, the integration looks like it has continued to be non-trivial, but the harmonisation now looks clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
Fixes and improvements to TAP, GCG, AutoDAN.