Sparse Rewards Can Self-Train Dialogue Agents

Barrett Martin Lattimer, Varun Gangal, Ryan McDonald, Yi Yang

This repo runs JOSH, the ToolWOZ, and τ-bench dataset. This repo also contains ways of logging training and preference-annotated episodes from user-simulator interactions and LORA-driven preference tuning of small LLMs from such preference annotated experience.

Setup

Run the following in a new env

pip install -e .

Unzip the dataset.zip file in the data folder
Set up your openai credentials

export OPENAI_API_KEY= # api_key
export OPENAI_ORGANIZATION= # api_org

If you're running Llama or another local model, you will also need to set HF_TOKEN much in the same way. Wherever you see HF_KEY please replace it by your huggingface token.

Running ToolWOZ

You can run ToolWOZ normally by doing the following

python josh_train/main.py

Increase the --max_concurrency depending on your api rate limits

JOSH on ToolWOZ

Enable JOSH on ToolWOZ by adding the --josh flag, and make the running of JOSH print updates by also adding --josh_debug

One example of a more involved JOSH prompt would be the following

python josh_train/main.py --josh --josh_debug --max_concurrency 20 --seed 20 --task_split train --temperature 1.0 --agent_strategy react --user_mode goal --model gpt-4o-mini --end_index 10 --beam_size 8

Running τ-bench

We have added a clone of τ-bench to this repo with two run files, one for normal τ-bench testing and another for JOSH rollouts on τ-bench

To run τ-bench normally you can do

python tau-bench-eval/run.py

JOSH on τ-bench

To run JOSH on τ-bench you can do

python tau-bench-eval/run.py --josh --debug

Using JOSH

A class of JOSH is provided in this repo to be very flexible and work for a wide variety of user/agent interactions. To use JOSH yourself, you can start with the following code snippit

from josh_train.josh import JOSH, BaseJOSHAgent, BaseRewards, BaseJOSHUser
def add_error_message(agent):
        agent.messages.append({'role':'assistant', 'content':'Error: Agent ran out of retries.'})
        return agent
    
def step_agent(agent:BaseJOSHAgent, **kwargs):
    pass_to_customer = agent.step(**kwargs)
    return agent, pass_to_customer

def step_user(user:BaseJOSHUser, agent:BaseJOSHAgent):
    agent, end_conversation = user.step(agent)
    return agent, end_conversation

josh = JOSH(
            rewards=BaseRewards(['say hello', 'say hello', 'say hello']),
            agent_step=step_agent,
            user_step=step_user,
            add_error_message=add_error_message,
            root_agent = BaseJOSHAgent(),
            user = BaseJOSHUser(),
            debug=True
        )

for _ in range(10):
    max_reward, all_done = josh.step()
    if all_done:
        break

print(max_reward)
print(josh.training_examples)

All classes can be built on top of, and expanded for further use.

MT-Bench

(If you want to later evaluate MTBench)

unzip mtbencheval.zip

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
db		db
josh_train		josh_train
prompts		prompts
tau-bench-eval		tau-bench-eval
.gitignore		.gitignore
README.md		README.md
final_file_push.py		final_file_push.py
final_file_push_kto.py		final_file_push_kto.py
final_file_push_taubench.py		final_file_push_taubench.py
final_file_push_taubench_kto.py		final_file_push_taubench_kto.py
instructionsToRunMTBench.md		instructionsToRunMTBench.md
mtbencheval.zip		mtbencheval.zip
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
takeAggregateWithOptBaseline.py		takeAggregateWithOptBaseline.py
training_fnames.json		training_fnames.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparse Rewards Can Self-Train Dialogue Agents

Setup

Running ToolWOZ

JOSH on ToolWOZ

Running τ-bench

JOSH on τ-bench

Using JOSH

MT-Bench

About

Releases

Packages

Contributors 3

Languages

asappresearch/josh-llm-simulation-training

Folders and files

Latest commit

History

Repository files navigation

Sparse Rewards Can Self-Train Dialogue Agents

Setup

Running ToolWOZ

JOSH on ToolWOZ

Running τ-bench

JOSH on τ-bench

Using JOSH

MT-Bench

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages