Skip to content

Actions: openai/evals

Actions

All workflows

Actions

Loading...
Loading

Showing runs from all workflows
270 workflow runs
270 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Update ReadMe with New Cookbook link
Run unit tests #1703: Pull request #1507 opened by royziv11
March 27, 2024 23:16 4m 23s updateReadMeWithCookbook
March 27, 2024 23:16 4m 23s
Updates on existing solvers and bugged tool eval
Run unit tests #1702: Pull request #1506 synchronize by ojaffe
March 27, 2024 16:44 3m 49s ojaffe:ollie/updates_270324
March 27, 2024 16:44 3m 49s
Updates on existing solvers and bugged tool eval
Run new evals #2251: Pull request #1506 synchronize by ojaffe
March 27, 2024 16:44 3m 37s ojaffe:ollie/updates_270324
March 27, 2024 16:44 3m 37s
Updates on existing solvers and bugged tool eval
Run new evals #2250: Pull request #1506 opened by ojaffe
March 27, 2024 16:37 3m 41s ojaffe:ollie/updates_270324
March 27, 2024 16:37 3m 41s
Updates on existing solvers and bugged tool eval
Run unit tests #1701: Pull request #1506 opened by ojaffe
March 27, 2024 16:37 10m 44s ojaffe:ollie/updates_270324
March 27, 2024 16:37 10m 44s
Add Gemini Solver (#1503)
Run unit tests #1700: Commit 5a92ac3 pushed by JunShern
March 26, 2024 15:27 4m 12s main
March 26, 2024 15:27 4m 12s
Unified create_retrying for all solvers (#1501)
Run unit tests #1699: Commit 150dcb9 pushed by JunShern
March 26, 2024 15:25 4m 2s main
March 26, 2024 15:25 4m 2s
Add Gemini Solver
Run unit tests #1698: Pull request #1503 synchronize by ojaffe
March 26, 2024 11:39 3m 56s ojaffe:ollie/add_gemini_solver
March 26, 2024 11:39 3m 56s
Add Gemini Solver
Run new evals #2249: Pull request #1503 synchronize by ojaffe
March 26, 2024 11:39 3m 55s ojaffe:ollie/add_gemini_solver
March 26, 2024 11:39 3m 55s
Unified create_retrying for all solvers
Run unit tests #1697: Pull request #1501 synchronize by ojaffe
March 26, 2024 11:35 10m 15s ojaffe:ollie/unify_retrying
March 26, 2024 11:35 10m 15s
Add info about logging and link to logviz (#1480)
Run unit tests #1696: Commit ac44aae pushed by etr2460
March 25, 2024 15:53 9m 36s main
March 25, 2024 15:53 9m 36s
Log model and usage stats in record.sampling (#1449)
Run unit tests #1695: Commit 9b2e1b1 pushed by etr2460
March 25, 2024 15:52 9m 58s main
March 25, 2024 15:52 9m 58s
Address sporadic hanging of evals on certain samples (#1482)
Run unit tests #1694: Commit bfe3925 pushed by etr2460
March 25, 2024 15:51 9m 5s main
March 25, 2024 15:51 9m 5s
TogetherSolver (#1502)
Run unit tests #1693: Commit 5805c20 pushed by JunShern
March 22, 2024 09:50 8m 40s main
March 22, 2024 09:50 8m 40s
Add Gemini Solver
Run unit tests #1692: Pull request #1503 synchronize by ojaffe
March 21, 2024 10:36 14m 41s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:36 14m 41s
Add Gemini Solver
Run new evals #2248: Pull request #1503 synchronize by ojaffe
March 21, 2024 10:36 3m 36s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:36 3m 36s
Add Gemini Solver
Run new evals #2247: Pull request #1503 opened by ojaffe
March 21, 2024 10:32 3m 46s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:32 3m 46s
Add Gemini Solver
Run unit tests #1691: Pull request #1503 opened by ojaffe
March 21, 2024 10:32 3m 55s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:32 3m 55s
TogetherSolver
Run unit tests #1690: Pull request #1502 opened by thesofakillers
March 21, 2024 10:25 3m 42s thesofakillers:together_solver
March 21, 2024 10:25 3m 42s
TogetherSolver
Run new evals #2246: Pull request #1502 opened by thesofakillers
March 21, 2024 10:25 7m 10s thesofakillers:together_solver
March 21, 2024 10:25 7m 10s
Unified create_retrying for all solvers
Run unit tests #1689: Pull request #1501 opened by ojaffe
March 21, 2024 08:49 3m 49s ojaffe:ollie/unify_retrying
March 21, 2024 08:49 3m 49s
AnthropicSolver (#1498)
Run unit tests #1688: Commit e30e141 pushed by JunShern
March 21, 2024 04:15 3m 48s main
March 21, 2024 04:15 3m 48s
Add Human-Relative MLAgentBench (#1496)
Run unit tests #1687: Commit 4f97ce6 pushed by JunShern
March 21, 2024 03:47 4m 56s main
March 21, 2024 03:47 4m 56s
Add Human-Relative MLAgentBench
Run unit tests #1686: Pull request #1496 synchronize by danesherbs
March 21, 2024 03:36 3m 37s danesherbs:dane/add-mlab-v2
March 21, 2024 03:36 3m 37s
Add Human-Relative MLAgentBench
Run new evals #2245: Pull request #1496 synchronize by danesherbs
March 21, 2024 03:36 3m 45s danesherbs:dane/add-mlab-v2
March 21, 2024 03:36 3m 45s