This repo demonstrates how to serve LangGraph agent application with BentoML.
See here for a full list of BentoML example projects.
- AI Agent Serving: Serving LangGraph Agent as REST API for easy integration
- Flexible Invocation: Supports both synchronous and asynchronous (queue-based) interactions.
- Deployment Options: Run locally or deploy to BentoCloud for scalability.
- LLM Deployment: Use external LLM APIs or deploy open-source LLM together with the Agent API service
This project serves as a reference implementation designed to be hackable, providing a foundation for building and customizing your own AI agent solutions.
Download source code:
git clone https://github.com/bentoml/BentoLangGraph.git
cd BentoLangGraph/
Follow the step-by-step guide for serving & deploying LangGraph agents with BentoML:
When running the example code which uses DuckDuckGo search tool, you may run into the following rate limit error:
RatelimitException('https://duckduckgo.com/ 202 Ratelimit')
You may use a different tool from LangChain's pre-built tools list here or create a custom tool.
For example, you can use exa to replace DuckDuckGo for search:
- from langchain_community.tools import DuckDuckGoSearchRun
- tools = [search]
+ from exa_search import retrieve_web_content
+ tools = [retrieve_web_content]
Interested in using LangGraph with other open-source LLMs? Checkout BentoVLLM for more sample code.
Join the BentoML developer community on Slack for more support and discussions!