-
Notifications
You must be signed in to change notification settings - Fork 1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update agent arena frontend and evals (#666)
Adding updated agent arena front end. Added evaluation folder to agent arena repository https://www.agent-arena.com/
- Loading branch information
1 parent
37f61bf
commit c834e57
Showing
31 changed files
with
171,426 additions
and
709 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,74 +1,95 @@ | ||
# Agent Arena | ||
|
||
# Agent Arena Frontend | ||
**Agent Arena** is a platform designed for users to compare and evaluate various language model agents across different models, frameworks, and tools. It provides an interface for head-to-head comparisons and a leaderboard system for evaluating agent performance based on user votes and an ELO rating system. | ||
|
||
This is the frontend of the [Agent Arena](https://www.agent-arena.com/), a platform where users can compare and evaluate various language model agents. The frontend is built using React and provides an interface for interacting with agents, creating comparisons, and viewing results. | ||
## Frontend | ||
|
||
## Contributing to Agent Arena | ||
The frontend of the Agent Arena is built using **React**. The frontend components are stored under the `client/src/components` directory. You can modify or enhance the UI by editing these files. | ||
|
||
If you'd like to contribute changes to the Agent Arena frontend, you can do so by creating a Pull Request (PR) in the Gorilla repository. Follow these steps: | ||
To get started with development on the frontend: | ||
|
||
1. **Fork the Gorilla Repository**: Start by forking the [Gorilla repository](https://github.com/ShishirPatil/gorilla) to your GitHub account. | ||
1. Navigate to the `client` folder. | ||
|
||
2. **Clone Your Fork**: Clone the forked repository to your local machine. | ||
```bash | ||
cd client | ||
``` | ||
|
||
2. Install the dependencies: | ||
|
||
```bash | ||
git clone https://github.com/<your-username>/gorilla.git | ||
npm install | ||
``` | ||
|
||
3. **Create a New Branch**: Create a new branch for your changes. | ||
3. Start the development server: | ||
|
||
```bash | ||
git checkout -b your-branch-name | ||
npm start | ||
``` | ||
|
||
4. **Make Your Changes**: Navigate to the `agent-arena/client` folder and make your changes to the frontend. | ||
The app will run in development mode, and you can view it at [http://localhost:3000](http://localhost:3000). | ||
|
||
5. **Test Your Changes**: Make sure to thoroughly test your changes locally before pushing them. | ||
|
||
6. **Commit and Push**: Commit your changes and push them to your forked repository. | ||
## Evaluation Directory | ||
|
||
```bash | ||
git add . | ||
git commit -m "Description of your changes" | ||
git push origin your-branch-name | ||
``` | ||
Agent Arena includes an evaluation directory where we have released the v0 dataset of real agent battles. This dataset includes: | ||
|
||
7. **Create a Pull Request**: Go to the original Gorilla repository and create a Pull Request (PR) from your fork. Provide a detailed description of the changes you've made. | ||
- **Notebook**: A Jupyter notebook (`Agent_Arena_Elo_Rating.ipynb`) that outlines the evaluation process for agents using ELO ratings. | ||
- **Data**: Several JSON files that store the agent, tool, framework, and model ratings. | ||
|
||
## Getting Started with Create React App | ||
To view the dataset and run the evaluation notebook, navigate to the `evaluation` directory: | ||
|
||
This project was bootstrapped with [Create React App](https://github.com/facebook/create-react-app). | ||
1. Open the notebook using Jupyter or any other notebook editor. | ||
|
||
### Available Scripts | ||
2. You can also find the ratings for agents, models, and tools in the respective JSON files in the `evaluation` directory: | ||
- `agent_ratings_V0.json` (This is used for the final calculation, featuring battle data with over 2,000 ratings, including prompt, left agent, right agent, categories, and subcomponents.) | ||
- `toolratings_V0.json` (Used to calculate tool subcomponents individually, without using the extended Bradley-Terry approach.) | ||
- `modelratings_V0.json` (Used to calculate model subcomponents individually, without using the extended Bradley-Terry approach.) | ||
- `frameworkratings_V0.json` (Used to calculate framework subcomponents individually, without using the extended Bradley-Terry approach.) | ||
|
||
In the project directory, you can run: | ||
|
||
#### `npm start` | ||
## ELO Ratings and Evaluation | ||
|
||
Runs the app in the development mode.\ | ||
Open [http://localhost:3000/](http://localhost:3000/) to view it in your browser. | ||
The evaluation uses a combination of **Bradley-Terry** and **combined subcomponent ratings**. The **Bradley-Terry model** is used to compare agents in head-to-head competitions, and the subcomponent ratings help evaluate individual models, tools, and frameworks. | ||
|
||
The page will reload when you make changes.\ | ||
You may also see any lint errors in the console. | ||
We have also released a **leaderboard** where you can view the current standings of agents. To access the leaderboard, visit: | ||
|
||
#### `npm test` | ||
[Agent Arena Leaderboard](https://www.agent-arena.com/leaderboard) | ||
|
||
Launches the test runner in the interactive watch mode.\ | ||
See the section about [running tests](https://facebook.github.io/create-react-app/docs/running-tests) for more information. | ||
### Instructions to Run | ||
|
||
#### `npm run build` | ||
1. Ensure you have Jupyter installed in your environment. | ||
2. Navigate to the `evaluation` directory. | ||
3. Run the notebook: | ||
|
||
Builds the app for production to the `build` folder.\ | ||
It correctly bundles React in production mode and optimizes the build for the best performance. | ||
Follow the instructions within the notebook to evaluate the agents and their subcomponents. | ||
|
||
The build is minified and the filenames include the hashes.\ | ||
Your app is ready to be deployed! | ||
## Contributing | ||
|
||
See the section about [deployment](https://facebook.github.io/create-react-app/docs/deployment) for more information. | ||
If you'd like to contribute changes to the Agent Arena, you can do so by creating a Pull Request (PR) in the Gorilla repository. Follow these steps: | ||
|
||
#### `npm run eject` | ||
1. Fork the [Gorilla repository](https://github.com/ShishirPatil/gorilla) to your GitHub account. | ||
2. Clone the forked repository to your local machine. | ||
```bash | ||
git clone https://github.com/<your-username>/gorilla.git | ||
``` | ||
3. Create a new branch for your changes. | ||
```bash | ||
git checkout -b your-branch-name | ||
``` | ||
4. Make your changes in the `client/src/components` or other relevant directories. | ||
5. Test your changes thoroughly. | ||
6. Commit your changes and push them to your forked repository. | ||
```bash | ||
git add . | ||
git commit -m "Description of your changes" | ||
git push origin your-branch-name | ||
``` | ||
7. Go to the original Gorilla repository and create a Pull Request from your fork. | ||
|
||
**Note: this is a one-way operation. Once you `eject`, you can't go back!** | ||
We welcome contributions and look forward to seeing your innovative ideas in action! | ||
|
||
If you aren't satisfied with the build tool and configuration choices, you can `eject` at any time. This command will remove the single build dependency from your project. | ||
## Links | ||
|
||
- **Arena**: [Agent-Arena](https://www.agent-arena.com/) | ||
- **Leaderboard**: [Agent Leaderboard](https://www.agent-arena.com/leaderboard) | ||
- **Prompt Hub**: [Prompt Hub](https://www.agent-arena.com/users) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,3 +23,4 @@ yarn-debug.log* | |
yarn-error.log* | ||
.DS_Store | ||
|
||
.env |
Oops, something went wrong.