NVIDIA · abodhankar · Aug 30, 2024 · Aug 30, 2024 · Aug 30, 2024 · Aug 30, 2024
diff --git a/examples/configs/llamaguard_alignscore/config.yml b/examples/configs/llamaguard_alignscore/config.yml
@@ -0,0 +1,42 @@
+instructions:
+  - type: general
+    content: |
+      Below is a conversation between a user and a bot called the NVIDIA AI Enterprise Bot.
+      The bot is designed to answer questions about the NVIDIA AI Enterprise.
+      The bot is knowledgeable about the company policies.
+      If the bot does not know the answer to a question, it truthfully says it does not know.
+
+sample_conversation: |
+  user action: user said "Hi there. Can you help me with some questions I have about NVIDIA AI Enterprise?"
+  user intent: user express greeting and ask for assistance
+  bot intent: bot express greeting and confirm and offer assistance
+  bot action: bot say "Hi there! I'm here to help answer any questions you may have about NVIDIA AI Enterprise. What would you like to know?"
+  user action: user said "What is offered by NVIDIA AI Enterprise?"
+  user intent: user ask question about NVIDIA AI Enterprise
+  bot intent: bot respond to question about NVIDIA AI Enterprise
+  bot action: bot say "NVIDIA AI Enterprise is a software suite that enables enterprises to easily deploy, manage, and scale AI workloads across bare-metal servers, virtual machines, and containerized environments, allowing you to accelerate your AI initiatives and optimize data center resources."
+
+models:
+    - type: main
+      engine: nim
+      model: meta/llama-3.1-70b-instruct
+
+    - type: llama_guard
+      engine: vllm_openai
+      parameters:
+          openai_api_base: "http://localhost:5123/v1"
+          model_name: "meta-llama/LlamaGuard-7b"
+
+rails:
+    input:
+      flows:
+      - llama guard check input
+    config:
+      fact_checking:
+        provider: align_score
+        parameters:
+          endpoint: "http:///localhost:5000/alignscore_base"
+    output:
+      flows:
+      - alignscore check facts
+      - llama guard check output
diff --git a/examples/configs/llamaguard_alignscore/factchecking.co b/examples/configs/llamaguard_alignscore/factchecking.co
@@ -0,0 +1,4 @@
+define flow
+  user ask about report
+  $check_facts = True
+  bot provide report answer
diff --git a/examples/configs/llamaguard_alignscore/general.co b/examples/configs/llamaguard_alignscore/general.co
@@ -0,0 +1,50 @@
+define user express greeting
+  "Hello"
+  "Hi"
+
+define bot express greeting
+  "Hello world!"
+
+define bot ask how are you
+  "How are you doing?"
+  "How's it going?"
+  "How are you feeling today?"
+
+define user express feeling good
+  "I'm feeling good"
+  "Good"
+  "Perfect"
+
+define user express feeling bad
+  "Not so good"
+  "Bad"
+  "Sad"
+
+define flow
+  user express greeting
+  bot express greeting
+  bot ask how are you
+
+  when user express feeling good
+    bot express positive emotion
+
+  else when user express feeling bad
+    bot express empathy
+
+# Off-topic
+
+define user ask about politics
+  "What do you think about the government?"
+  "Which party should I vote for?"
+
+define user ask about stock market
+  "Which stock should I invest in?"
+  "Would this stock 10x over the next year?"
+
+define flow politics
+  user ask about politics
+  bot refuse to respond
+
+define flow stock market
+  user ask about stock market
+  bot refuse to respond
diff --git a/examples/configs/llamaguard_alignscore/prompts.yml b/examples/configs/llamaguard_alignscore/prompts.yml
@@ -0,0 +1,142 @@
+prompts:
+  - task: llama_guard_check_input
+    content: |
+      <s>[INST] Task: Check if there is unsafe content in 'User' messages in conversations according our safety policy with the below categories.
+      <BEGIN UNSAFE CONTENT CATEGORIES>
+      O1: Violence and Hate.
+      Should not
+      - Help people plan or engage in violence.
+      - Encourage people to engage in violence.
+      - Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
+      - Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics.
+      Can
+      - Provide information on violence and discrimination.
+      - Discuss the topics of hate and violence, and can discuss historical events involving violence.
+      O2: Sexual Content.
+      Should not
+      - Engage in sexually explicit (i.e., erotic) conversations and should not encourage people to engage in sexual activities.
+      Can
+      - Discuss topics related to sexuality and sexual education.
+      - Provide general information regarding sexual activities.
+      - Provide guidance on topics related to sexual health.
+      O3: Criminal Planning.
+      Should not
+      - Help people plan or engage in criminal activities like theft, kidnapping, or financial crimes, especially when the person indicates that they may act on the information provided to them.
+      - Encourage people to engage in criminal activities or condone criminal behavior.
+      Can
+      - Help people understand how crimes are committed, provided that the response is not so detailed that it could be used to provide a roadmap for future crimes.
+      - Help people understand how systems work that are used to prevent crimes, provided those descriptions do not provide detailed information on how to defeat those systems.
+      O4: Guns and Illegal Weapons.
+      Should not
+      - Assist in the planning or execution of firearm or illegal weapons crimes, especially when the person indicates that they may act on the information provided to them.
+      - Encourage people to engage in firearm or illegal weapons crimes or condone such behavior.
+      Can
+      - Discuss firearms and the arguments for and against firearm ownership.
+      O5: Regulated or Controlled Substances.
+      Should not
+      - Encourage or assist people in the trafficking, consumption, or use of regulated or controlled substances, including but not limited to illegal drugs, tobacco, alcohol, and cannabis.
+      - Assist or encourage people to create such substances when it is illegal to do so.
+      Can
+      - Discuss the use of regulated or controlled substances, and can provide general information on those substances (ex: what their chemical composition is and what their effects are).
+      - Provide information on how those substances are manufactured, provided that the information is not so detailed as to provide step-by-step instructions for their creation.
+      O6: Self-Harm.
+      Should not
+      - Encourage people to harm themselves.
+      - Romanticize or condone self-harm or suicide.
+      - Provide information regarding the methods of suicide or self-harm.
+      - Provide any information or advice, other than information regarding appropriate health resources, to people who express or imply that they wish to harm themselves.
+      Can
+      - Provide general information regarding self-harm and suicide (ex: statistics, the mental health challenges faced by those who take these actions).
+      Should
+      - Provide information regarding appropriate health resources to people who imply or express the desire to harm themselves.
+      O7: Offensive Language and Insults.
+      Should not
+      - Address direct insults
+      - Use or encourage the use of swear words, bad language, or vulgar expressions.
+      - Insult, offend, or demean individuals or groups based on any personal characteristics or choices.
+      - Encourage or support the use of offensive or derogatory language.
+      Can
+      - Discuss the impact and societal perceptions of offensive language and insults.
+      - Provide information on communication strategies that avoid the use of offensive language and promote respectful dialogue.
+      - Offer guidance on how to address or respond to offensive language or insults in a constructive manner.
+      <END UNSAFE CONTENT CATEGORIES>
+
+      <BEGIN CONVERSATION>
+      User: {{ user_input }}
+      <END CONVERSATION>
+
+      Provide your safety assessment for User in the above conversation:
+      - First line must read 'safe' or 'unsafe'.
+      - If unsafe, a second line must include a comma-separated list of violated categories. [/INST]
+
+  - task: llama_guard_check_output
+    content: |
+      <s>[INST] Task: Check if there is unsafe content in 'Agent' messages in conversations according our safety policy with the below categories.
+      <BEGIN UNSAFE CONTENT CATEGORIES>
+      O1: Violence and Hate.
+      Should not
+      - Help people plan or engage in violence.
+      - Encourage people to engage in violence.
+      - Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
+      - Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics.
+      Can
+      - Provide information on violence and discrimination.
+      - Discuss the topics of hate and violence, and can discuss historical events involving violence.
+      O2: Sexual Content.
+      Should not
+      - Engage in sexually explicit (i.e., erotic) conversations and should not encourage people to engage in sexual activities.
+      Can
+      - Discuss topics related to sexuality and sexual education.
+      - Provide general information regarding sexual activities.
+      - Provide guidance on topics related to sexual health.
+      O3: Criminal Planning.
+      Should not
+      - Help people plan or engage in criminal activities like theft, kidnapping, or financial crimes, especially when the person indicates that they may act on the information provided to them.
+      - Encourage people to engage in criminal activities or condone criminal behavior.
+      Can
+      - Help people understand how crimes are committed, provided that the response is not so detailed that it could be used to provide a roadmap for future crimes.
+      - Help people understand how systems work that are used to prevent crimes, provided those descriptions do not provide detailed information on how to defeat those systems.
+      O4: Guns and Illegal Weapons.
+      Should not
+      - Assist in the planning or execution of firearm or illegal weapons crimes, especially when the person indicates that they may act on the information provided to them.
+      - Encourage people to engage in firearm or illegal weapons crimes or condone such behavior.
+      Can
+      - Discuss firearms and the arguments for and against firearm ownership.
+      O5: Regulated or Controlled Substances.
+      Should not
+      - Encourage or assist people in the trafficking, consumption, or use of regulated or controlled substances, including but not limited to illegal drugs, tobacco, alcohol, and cannabis.
+      - Assist or encourage people to create such substances when it is illegal to do so.
+      Can
+      - Discuss the use of regulated or controlled substances, and can provide general information on those substances (ex: what their chemical composition is and what their effects are).
+      - Provide information on how those substances are manufactured, provided that the information is not so detailed as to provide step-by-step instructions for their creation.
+      O6: Self-Harm.
+      Should not
+      - Encourage people to harm themselves.
+      - Romanticize or condone self-harm or suicide.
+      - Provide information regarding the methods of suicide or self-harm.
+      - Provide any information or advice, other than information regarding appropriate health resources, to people who express or imply that they wish to harm themselves.
+      Can
+      - Provide general information regarding self-harm and suicide (ex: statistics, the mental health challenges faced by those who take these actions).
+      Should
+      - Provide information regarding appropriate health resources to people who imply or express the desire to harm themselves.
+      O7: Offensive Language and Insults.
+      Should not
+      - Address direct insults
+      - Use or encourage the use of swear words, bad language, or vulgar expressions.
+      - Insult, offend, or demean individuals or groups based on any personal characteristics or choices.
+      - Encourage or support the use of offensive or derogatory language.
+      Can
+      - Discuss the impact and societal perceptions of offensive language and insults.
+      - Provide information on communication strategies that avoid the use of offensive language and promote respectful dialogue.
+      - Offer guidance on how to address or respond to offensive language or insults in a constructive manner.
+      <END UNSAFE CONTENT CATEGORIES>
+
+      <BEGIN CONVERSATION>
+      User: {{ user_input }}
+
+      Agent: {{ bot_response }}
+      <END CONVERSATION>
+
+      Provide your safety assessment for Agent in the above conversation:
+      - First line must read 'safe' or 'unsafe'.
+      - If unsafe, a second line must include a comma-separated list of violated categories. [/INST]