Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dall-e example for image generation LLM jailbreaking #82

Closed
wants to merge 1 commit into from

Conversation

ysy970923
Copy link
Contributor

@ysy970923 ysy970923 commented Mar 4, 2024

Description

Added a Image Generation AI Jailbreaking Example noteboook with Dall-E.
Pull request based on suggestion of @romanlutz. (based on Issue #74 )

Tests

  • no new tests required
  • new tests added
  • existing tests adjusted

Documentation

  • no documentation changes needed
  • documentation added or edited
  • example notebook added or updated

@ysy970923
Copy link
Contributor Author

This currently does jailbreak dall-e, but to increase efficiency and performance I will optimize the prompts and the pipeline if neeed in the near future.

@ysy970923
Copy link
Contributor Author

ysy970923 commented Mar 4, 2024 via email

@romanlutz
Copy link
Contributor

Thanks @ysy970923 , this is quite impressive! I'm doing a major refactor of RedTeamingBot at the moment, so I'll get back to you when that's merged since we'll need to make some adjustments (mostly simplifications).

In the meantime, could you perhaps use something like "a person cutting down a stop sign" as the image generation goal?

Thanks for your patience in the meantime!

@ysy970923
Copy link
Contributor Author

Thanks for the feedback!
I'm doing some optimizations so no need to hurry 👍
I'll try some more image generation goals like the one you suggested.

Thank you :)

@romanlutz
Copy link
Contributor

Thanks for the feedback! I'm doing some optimizations so no need to hurry 👍 I'll try some more image generation goals like the one you suggested.

Thank you :)

Hi @ysy970923 I merged my PR #84 which removes RedTeamingBot and replaces it with red teaming orchestrators. Your PR needs a few updates based on that. Also, @jbolor21 is looking into creating support for image targets and can update you here when that's added.

Feel free to ping me when you make the updates you mentioned earlier 🙂

@rlundeen2
Copy link
Contributor

We should be in a good state to implement this attack;

We have a dall-e target and the red teaming orchestrator that you should be able to combine. We also have the memory component that should contain details, but you may need a new scorer to use this information with the red teaming bot. Let us know if you're planning on doing it or have any questions!

@romanlutz
Copy link
Contributor

romanlutz commented May 4, 2024

We should be in a good state to implement this attack;

We have a dall-e target and the red teaming orchestrator that you should be able to combine. We also have the memory component that should contain details, but you may need a new scorer to use this information with the red teaming bot. Let us know if you're planning on doing it or have any questions!

I started looking into the required changes and they're quite extensive. I don't think we can expect you to make such huge changes so I'll take up this work. I'll keep it in this PR so that you get the credit for the original idea and implementation @ysy970923 if that's okay with you. If not, we can close this one and I'll open a new one.

Update: didn't have permissions to push to your fork, so I'll create a new PR (while making sure to credit you, of course.)

@ysy970923
Copy link
Contributor Author

We should be in a good state to implement this attack;

We have a dall-e target and the red teaming orchestrator that you should be able to combine. We also have the memory component that should contain details, but you may need a new scorer to use this information with the red teaming bot. Let us know if you're planning on doing it or have any questions!

I started looking into the required changes and they're quite extensive. I don't think we can expect you to make such huge changes so I'll take up this work. I'll keep it in this PR so that you get the credit for the original idea and implementation @ysy970923 if that's okay with you. If not, we can close this one and I'll open a new one.

thanks. I was thinking of how to implement this but was quite challenging.
I appreciate giving me credit thanks so much 🙂

@romanlutz
Copy link
Contributor

We should be in a good state to implement this attack;
We have a dall-e target and the red teaming orchestrator that you should be able to combine. We also have the memory component that should contain details, but you may need a new scorer to use this information with the red teaming bot. Let us know if you're planning on doing it or have any questions!

I started looking into the required changes and they're quite extensive. I don't think we can expect you to make such huge changes so I'll take up this work. I'll keep it in this PR so that you get the credit for the original idea and implementation @ysy970923 if that's okay with you. If not, we can close this one and I'll open a new one.

thanks. I was thinking of how to implement this but was quite challenging. I appreciate giving me credit thanks so much 🙂

Of course! Since I couldn't add to this PR due to restricted permissions I've opened #189

@romanlutz
Copy link
Contributor

Closing in favor of #189

@romanlutz romanlutz closed this May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants