Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiments and observations using LangChain with sft models (Tools/Data retrieval) #2193

Closed
draganjovanovich opened this issue Mar 24, 2023 · 10 comments

Comments

@draganjovanovich
Copy link
Collaborator

draganjovanovich commented Mar 24, 2023

I have been experimenting with LangChain and my own "Tool usage for LLMs" pipeline for a while now, so here I will show some of my experiment results, observations, and ideas, so we can decide if we want to include something similar in the OA as well.

So from my experience, if we want tools/plugins to be reliable we need a model that will be able to strictly follow the instructions and patterns.

Currently, without any real struggle, I was able to use only llama_30b_sft. All others are failing on the second and often on the first question asked by the prompter.
btw, for regular Q/A without tools and CoT requirements llama_13b_sft seems also fineish...

Of course with the "clever" prompting there is a chance for currently available smaller models to perform better, but I would not recommend it, because they just do not behave "smart" enough.

What I see as a possibility for the usage of smaller models with tools, is that we could prepare such a dataset/s for sft/RLHF, which will help them get better in CoT and follow complex rules and instructions.
For now, it seems like the bottom threshold that we could try is 13B.

All of the statements that I have made are based not only on testing and experimenting with LangChain, but with my custom promptings and helpers (You maybe saw those examples in https://discord.com/channels/1055935572465700980/1067096888530178048 ), to guide the models, in the "right" direction.

model best performed with the following agent:
conversational-react-description

and it will be great if we can achieve to perform well with
zero-shot-react-description also.

Small pseudo/snippet code of custom usage of LangChain with huggingface models:

from langchain import HuggingFacePipeline, LLMMathChain
from langchain.agents import  Tool, initialize_agent
from langchain.memory import ConversationBufferMemory
from langchain.utilities import GoogleSearchAPIWrapper, PythonREPL
import gradio as gr
import datetime
import os
from transformers import AutoModelForCausalLM, LlamaTokenizer, pipeline

# HERE YOU DEFINE ENV VARS FOR TOKENS, AND API KEYS

model_id = "hf/repo"
tokenizer = LlamaTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto",
                                             load_in_8bit=True)
pipe = pipeline("text-generation",
                model=model,
                tokenizer=tokenizer,
                max_new_tokens=512,
                top_k= 50,
                temperature=0.0,
                repetition_penalty = (1/0.83),
                )
llm = HuggingFacePipeline(pipeline=pipe)

llm_math_chain = LLMMathChain(llm=llm, verbose=True)
search = GoogleSearchAPIWrapper()
python_repl = PythonREPL()

tools = [
    Tool(
        name = "Search",
        func=search.run,
        description="useful for when you need to answer questions about current events or the current state of the world\nDO NOT USE THIS FOR SIMPLE QUESTIONS!"
    ),
    Tool(
        name="Calculator",
        func=llm_math_chain.run,
        description="useful for when you need to answer questions about mathematically related topics\nUSE ONLY FOR MATH RELATED QUESTIONS!"
    ),
    Tool(
        name="Python REPL",
        func=python_repl.run,
        description = "A Python shell. Use this to execute python commands. "
        "Input should be a valid python command. "
        "If you want to see the result, you should print it out "
        "with `print(...)`."
    )
]
tool_names = [tool.name for tool in tools]

PREFIX = """Consider that you are AI Assistant named AI, AI is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.

Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.
You love to answer questions and you are very good at it.
Assistant has access to the following tools:"""

INSTRUCTIONS = """
To use a tool, please use the following format:
``
Thought: Should I use a tool? Yes
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
``
When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:
``
Thought: Should I use a tool? No
AI: [your response here]
``
"""
SUFFIX = """
CHAT HISTORY:
{chat_history}

Current time: {current_time}
Knowledge date cutoff: 2021-09-01

When answering a question, you MUST use the following language: {language}
Begin!
Question: {input}
Thought: Should I use a tool?{agent_scratchpad}"""

memory = ConversationBufferMemory(memory_key="chat_history",# return_messages=True,
                                  input_key="input", output_key='output', ai_prefix='AI', human_prefix='User')

agent = initialize_agent(tools,
                         llm,
                         agent="conversational-react-description",
                         verbose=True,
                         memory=memory,
                         return_intermediate_steps=False,
                         agent_kwargs={'input_variables': ['input', 'agent_scratchpad', 'chat_history', 'current_time', 'language'],
                                       'prefix': PREFIX,
                                       'format_instructions': INSTRUCTIONS,
                                       'suffix': SUFFIX})
agent.agent.llm_chain.verbose=True

def main():
    depth = 0
    with gr.Blocks() as app:
        chatbot = gr.Chatbot(elem_id="chatbot")
        state = gr.State([])

        with gr.Row():
            msg = gr.Textbox(show_label=False, placeholder="Enter text and press enter").style(container=False)

        def user(user_message, history):
            if history is None:
                history = ""
            return "", history + [[user_message, None]]

        def bot(history):
            global depth
            depth += 1
            prompt = history[-1][0]
            res = agent({"input":prompt,
                         "current_time":datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
                         "language": "English"})

            response = res['output']
            history[-1][1] = response

            # free up some memory if we have too many messages
            if depth > 8 or len(memory.buffer) > 512:
                memory.chat_memory.messages.pop(0)
            return history

        msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
                bot, chatbot, chatbot)

    app.queue()
    app.launch(share=False, server_name="0.0.0.0")

if __name__ == "__main__":
    main()

What I see as useful for OA is that we can use this kind of stuff for:

  • dataset generation (so we can for example finetune a small model to be performant in this kind of scenario)
  • model performance evaluation
  • getting OA ability to use tools/plugins

Some demos:
Screenshot 2023-03-24 at 17 44 06
Some background stuff
Screenshot 2023-03-24 at 17 43 56
Screenshot 2023-03-24 at 11 40 59
In this last demo, model generated correct code blocks, but gradio messed up render.

prompt used in most cases when i was trying smaller models, that are unable to follow LangChain patterns:
https://pastebin.pl/view/3541703a

NEW DEMO EXAMPLES: (Added Vision tool)
Vision tool, is just basically laions-coca model + ocr, but it helped llama to generate some interesting comments :)

Screenshot 2023-04-05 at 11 41 18

Screenshot 2023-04-05 at 11 30 50

@draganjovanovich draganjovanovich changed the title Experiments and observations using LangChain with OA sft models (Tools/Data retrieval) Experiments and observations using LangChain with sft models (Tools/Data retrieval) Mar 24, 2023
@draganjovanovich
Copy link
Collaborator Author

draganjovanovich commented Mar 24, 2023

And this is a prompt that I used when I needed to get tool usage with small models, which does not work with langchain, but it's also not great below 13B params:
https://pastebin.pl/view/3541703a

@PaulConyngham
Copy link

this is awesome. I love your idea around dataset generation

@draganjovanovich
Copy link
Collaborator Author

draganjovanovich commented Apr 3, 2023

Something like this should be considered also. So that in the future we can support those community-created plugins for oai models as well.
https://platform.openai.com/docs/plugins/examples

@alif-munim
Copy link

This is awesome! Would love to contribute if you open up any issues in the future

@theblackcat102
Copy link
Collaborator

theblackcat102 commented Apr 25, 2023

Hi @draganjovanovich I am working on converting MOSS dataset which include : Calculator, Search, Text-to-Image and Equation solver for OA format.
Their format is quite similar to yours (Question, Thought, Action- Action Input and Answer Observation => Response )

{
        "Human": "<|Human|>: Can you draw me a fox holding a card?<eoh>\n",
        "Inner Thoughts": "<|Inner Thoughts|>: The user's demand is to draw picture, and I need to generate commands that can draw high-quality image according to the user's needs.<eot>\n",
        "Commands": "<|Commands|>: Text2Image(\"a fox holding a card\")<eoc>\n",
        "Tool Responses": "<|Results|>: Image generated successfully.<eor>\n",
        "MOSS": "<|MOSS|>:  Well, here is the generated image.<eom>\n"
}

I might need the dataset format for SFT on tool using, have you decided yet? For now I think the section the model need to learn is Thought, Action-Action Input and Response, the Answer Observation was masked from loss

@draganjovanovich
Copy link
Collaborator Author

Please check out my PR for the plugin system, and read plugins.md, and afterward, maybe would be best for us to make a short discord call ?

@prof-schacht
Copy link

Is there still work to do where I and my team can help?

@draganjovanovich
Copy link
Collaborator Author

Hi, thanks for reaching out.

We are now incorporated a simple MVP of a plugin system into the production, this was relevant PR #2765
I started thinkering about some dataset generation for improving llm performance in such scenarios with plugin usage as explained here in the original message.

To see how plugins currently are implemented relevant files are in ./inference/worker/chat_chain* and also plugins.md file for now could be helpful.

@pevogam
Copy link
Contributor

pevogam commented May 25, 2023

Hi, for anyone interested this can be run with 4bit GPTQ quantized 30B models like so:

import os
import datetime

# /conda/envs/oasst/bin/pip install gradio
import gradio as gr

from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
from transformers import AutoTokenizer, pipeline

from langchain import HuggingFacePipeline, LLMMathChain
from langchain.agents import  Tool, initialize_agent
from langchain.memory import ConversationBufferMemory
from langchain.utilities import GoogleSearchAPIWrapper, PythonREPL

# HERE YOU DEFINE ENV VARS FOR TOKENS, AND API KEYS

quantized_model_dir = "/data/models--TheBloke--OpenAssistant-SFT-7-Llama-30B-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=True)
hf_config = quantize_config = BaseQuantizeConfig(
    bits=4,  # quantize model to 4-bit
    group_size=128,  # it is recommended to set the value to 128
)
model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:0", use_triton=False,
                                           quantize_config=quantize_config, use_safetensors=True,
                                           model_basename="OpenAssistant-30B-epoch7-GPTQ-4bit-1024g.compat.no-act-order")

"""
# TODO: better to use explicit default pipeline?
from transformers import TextGenerationPipeline
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)
print(pipeline("auto-gptq is")[0]["generated_text"])
"""
pipe = pipeline("text-generation",
                model=model,
                tokenizer=tokenizer,
                max_new_tokens=512,
                top_k= 50,
                temperature=0.0,
                repetition_penalty = (1/0.83),
                )
llm = HuggingFacePipeline(pipeline=pipe)

#TODO: /opt/miniconda/envs/text-generation/lib/python3.9/site-packages/langchain/chains/llm_math/base.py:50: UserWarning: Directly instantiating an LLMMathChain with an #llm is deprecated. Please instantiate with llm_chain argument or using the from_llm class method.
llm_math_chain = LLMMathChain(llm=llm, verbose=True)
# TODO: google API key not available
#search = GoogleSearchAPIWrapper()
python_repl = PythonREPL()

tools = [
    Tool(
        name="Calculator",
        func=llm_math_chain.run,
        description="useful for when you need to answer questions about mathematically related topics\nUSE ONLY FOR MATH RELATED QUESTIONS!"
    ),
    Tool(
        name="Python REPL",
        func=python_repl.run,
        description = "A Python shell. Use this to execute python commands. "
        "Input should be a valid python command. "
        "If you want to see the result, you should print it out "
        "with `print(...)`."
    )
]
tool_names = [tool.name for tool in tools]

PREFIX = """Consider that you are AI Assistant named AI, AI is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.

Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.
You love to answer questions and you are very good at it.
Assistant has access to the following tools:"""

INSTRUCTIONS = """
To use a tool, please use the following format:
``
Thought: Should I use a tool? Yes
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
``
When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:
``
Thought: Should I use a tool? No
AI: [your response here]
``
"""
SUFFIX = """
CHAT HISTORY:
{chat_history}

Current time: {current_time}
Knowledge date cutoff: 2021-09-01

When answering a question, you MUST use the following language: {language}
Begin!
Question: {input}
Thought: Should I use a tool?{agent_scratchpad}"""

memory = ConversationBufferMemory(memory_key="chat_history",# return_messages=True,
                                  input_key="input", output_key='output', ai_prefix='AI', human_prefix='User')

agent = initialize_agent(tools,
                         llm,
                         agent="conversational-react-description",
                         verbose=True,
                         memory=memory,
                         return_intermediate_steps=False,
                         agent_kwargs={'input_variables': ['input', 'agent_scratchpad', 'chat_history', 'current_time', 'language'],
                                       'prefix': PREFIX,
                                       'format_instructions': INSTRUCTIONS,
                                       'suffix': SUFFIX})
agent.agent.llm_chain.verbose=True

depth = 0
def main():
    with gr.Blocks() as app:
        chatbot = gr.Chatbot(elem_id="chatbot")
        state = gr.State([])

        with gr.Row():
            msg = gr.Textbox(show_label=False, placeholder="Enter text and press enter").style(container=False)

        def user(user_message, history):
            if history is None:
                history = ""
            return "", history + [[user_message, None]]

        def bot(history):
            global depth
            depth += 1
            prompt = history[-1][0]
            res = agent({"input":prompt,
                         "current_time":datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
                         "language": "English"})

            response = res['output']
            history[-1][1] = response

            # free up some memory if we have too many messages
            if depth > 8 or len(memory.buffer) > 512:
                memory.chat_memory.messages.pop(0)
            return history

        msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
                bot, chatbot, chatbot)

    app.queue()
    app.launch(share=False, server_name="0.0.0.0")

if __name__ == "__main__":
    main()

@olliestanley
Copy link
Collaborator

Closing this now we have the plugins system but if there are other parts of it which could be separated out into their own smaller issues feel free to do this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants