Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running benchmarks #812

Merged
merged 29 commits into from
Oct 30, 2023
Merged

Running benchmarks #812

merged 29 commits into from
Oct 30, 2023

Conversation

ernestum
Copy link
Collaborator

Description

Add scripts to run the entire benchmark suite.

Testing

Description of how you've tested your changes.

Copy link
Member

@qxcv qxcv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some requests for clarification and small changes.

src/imitation/util/sacred_file_parsing.py Show resolved Hide resolved
benchmarking/compute_probability_of_improvement.py Outdated Show resolved Hide resolved
benchmarking/compute_probability_of_improvement.py Outdated Show resolved Hide resolved
benchmarking/compute_probability_of_improvement.py Outdated Show resolved Hide resolved
benchmarking/compute_probability_of_improvement.py Outdated Show resolved Hide resolved
benchmarking/compute_probability_of_improvement.py Outdated Show resolved Hide resolved
benchmarking/compute_probability_of_improvement.py Outdated Show resolved Hide resolved
benchmarking/compute_probability_of_improvement.py Outdated Show resolved Hide resolved
benchmarking/compute_probability_of_improvement.py Outdated Show resolved Hide resolved
@ernestum
Copy link
Collaborator Author

To give you an idea of how the output CSV and Markdown look:
summary.csv
summary.md

@ernestum ernestum requested a review from qxcv October 20, 2023 19:04
@codecov
Copy link

codecov bot commented Oct 23, 2023

Codecov Report

Merging #812 (dcde3a9) into master (d833d9e) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #812      +/-   ##
==========================================
+ Coverage   95.63%   95.67%   +0.03%     
==========================================
  Files         100      102       +2     
  Lines        9582     9654      +72     
==========================================
+ Hits         9164     9236      +72     
  Misses        418      418              
Files Coverage Δ
src/imitation/data/huggingface_utils.py 98.00% <100.00%> (ø)
src/imitation/util/sacred_file_parsing.py 100.00% <100.00%> (ø)
tests/util/test_sacred_file_parsing.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@taufeeque9 taufeeque9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I really like the benchmarking readme! It seems super helpful.

benchmarking/compute_probability_of_improvement.py Outdated Show resolved Hide resolved
benchmarking/compute_probability_of_improvement.py Outdated Show resolved Hide resolved
@@ -0,0 +1,201 @@
#!/bin/bash
python -m imitation.scripts.train_imitation bc with bc_seals_ant seed=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can replace the whole file with the below code. Even if we want to keep the file with separate lines for parallelization purposes, it would be nice to add the below code in a separate file. The full file can be generated with the below code by adding an echo in front of python. This will make it easier for future maintenance when we add other envs and algorithms.

#!/bin/bash

seeds=(1 2 3 4 5 6 7 8 9 10)
envs=(
    "seals_ant"
    "seals_half_cheetah"
    "seals_hopper"
    "seals_swimmer"
    "seals_walker"
)
script_algos=(
    "imitation" "bc"
    "imitation" "dagger"
    "adversarial" "airl"
    "adversarial" "gail"
)

for env in "${envs[@]}"; do
    for ((i=0; i<${#script_algos[@]}; i+=2)); do
        script=${script_algos[$i]}
        algo=${script_algos[$((i+1))]}
        for seed in "${seeds[@]}"; do
            python -m imitation.scripts.train_${script} ${algo} with ${algo}_${env} seed=${seed}
        done
    done
done

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this can be easily be generated. I used:

envs = [
    "seals_ant",
    "seals_half_cheetah",
    "seals_hopper",
    "seals_swimmer",
    "seals_walker",
]

for env in envs:
    for algo in ["bc", "dagger"]:
        for seed in range(1,11):
            print(f"python -m imitation.scripts.train_imitation {algo} with {algo}_{env} seed={seed}")

    for algo in ["airl", "gail"]:
        for seed in range(1, 11):
            print(f"python -m imitation.scripts.train_adversarial {algo} with {algo}_{env} seed={seed}")

However, I opted for solution with less moving parts. Whenever we add new algorithms or change something else I can just re-write this script in 3min. As soon as we put the code in the repo we have to maintain it, test it, document it. I think that is a lot of effort just to safe us those 3mins to re-write this on-off script.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that's true.

Having the run_all_benchmarks.sh file in the repo feels slightly weird. Do you think we can remove the file entirely and just keep the above python/bash script that generates those commands? There doesn't seem any need to have the run_all_benchmarks.sh script at all.

Copy link
Member

@qxcv qxcv Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this matters much either way (GPT4 should be able to switch between these two formats at will), but I have a slight preference for the current approach because I think it makes it easier to see exactly what was executed, and also to run just a subset of the experiments by copy-pasting.

(The copy-pasting is particularly valuable because I usually end up having to manually assign jobs to GPUs or nodes or whatever, depending on what compute infra I'm using.)

@@ -0,0 +1,21 @@
#!/bin/bash
sbatch --job-name=bc_seals_ant run_benchmark_on_slurm.sh train_imitation bc seals_ant
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can replace the whole file here also in a similar way as above:

#!/bin/bash

envs=(
    "seals_ant"
    "seals_half_cheetah"
    "seals_hopper"
    "seals_swimmer"
    "seals_walker"
)

script_algos=(
    "imitation" "bc"
    "imitation" "dagger"
    "adversarial" "airl"
    "adversarial" "gail"
)

for env in "${envs[@]}"; do
    for ((i=0; i<${#script_algos[@]}; i+=2)); do
        script=${script_algos[$i]}
        algo=${script_algos[$((i+1))]}
        job_name="${algo}_${env}"
        echo sbatch --job-name="${job_name}" run_benchmark_on_slurm.sh "train_${script}" "${algo}" "${env}"
    done
done

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment.

Copy link
Member

@qxcv qxcv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-reviewed the diff from my last review. Looks good to merge!

On the debate about including the script that generates the script that runs experiments or just including the script that runs experiments: it seems like we're 2v1 in favor if the current approach, so I'm inclined to just merge as-is.

@ernestum
Copy link
Collaborator Author

Great. This are the final results. Raw values as well as CSV and Markdown Summary. This is to be added as a release artifact
benchmark_runs.zip
!

@ernestum ernestum merged commit de589d4 into master Oct 30, 2023
15 checks passed
@ernestum ernestum deleted the running_benchmarks branch October 30, 2023 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants