This article contains affiliate links. We may earn a commission if you purchase through them — at no extra cost to you.
You’ve read the blog posts about AI agents. You’ve watched the demos. Now you actually want to build one that doesn’t fall apart after two tool calls. That’s exactly what this guide is for.
The LangGraph + Claude stack has become the serious developer’s choice for stateful agents in 2026 — and for good reason. LangGraph gives you explicit control over agent state and execution flow (no more praying your agent doesn’t loop forever), while Claude 3.5 Sonnet and 3.7 Sonnet bring genuinely strong tool-use and instruction-following that makes agent reliability dramatically better than it was two years ago. I’ve built production agents on this stack and I’ll show you exactly how it works, including the parts the official docs gloss over.
What We’re Building
We’re going to build a research assistant agent that can:
- Search the web for information on a given topic
- Decide whether it has enough information or needs to search again
- Summarize findings into a structured report
- Maintain state across the entire workflow so nothing gets lost
This is a realistic, non-trivial agent — not a toy “calculator tool” example. By the end, you’ll understand the LangGraph mental model well enough to build your own agents from scratch.
Why LangGraph Over LangChain Agents or AutoGen?
Before we write a line of code, let’s be honest about the tradeoffs. LangChain’s older AgentExecutor was a black box — great for demos, terrible when you needed to debug why your agent called the wrong tool six times in a row. AutoGen is powerful but its multi-agent orchestration model is overkill for most single-agent use cases and the debugging story is rough.
LangGraph’s core insight is treating agent execution as a directed graph of nodes and edges. Each node is a function. Each edge is a transition. You can add conditional logic, loops, and human-in-the-loop checkpoints explicitly. You can see exactly what state looks like at every step. For production agents, this isn’t optional — it’s the difference between a system you can actually maintain and one you throw away after three months.
If you’re comparing Claude to other models for this kind of work, check out our Claude vs ChatGPT for Developers review — the short version is Claude’s tool use and long-context handling make it the better choice for agentic workflows right now.
Prerequisites
- Python 3.11+
- An Anthropic API key (Claude 3.5 Sonnet is the sweet spot for cost/performance)
- Basic familiarity with Python and async concepts
pip install langgraph langchain-anthropic tavily-python
We’re using Tavily for web search — it’s purpose-built for LLM agents and returns clean, structured results. You’ll need a free API key from them too.
Get the dev tool stack guide
A weekly breakdown of the tools worth your time — and the ones that aren’t. Join 500+ developers.
No spam. Unsubscribe anytime.
Step 1: Define Your Agent State
This is the part most tutorials skip, and it’s the most important design decision you’ll make. State in LangGraph is a typed dictionary that persists across every node in your graph. Get this wrong and you’ll be refactoring everything later.
from typing import TypedDict, Annotated, List
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage
class ResearchState(TypedDict):
# The conversation/message history
messages: Annotated[List[BaseMessage], add_messages]
# The research topic
topic: str
# Search results accumulated across iterations
search_results: List[str]
# How many searches we've done (circuit breaker)
search_count: int
# Final report output
final_report: str
# Whether we have enough info to write the report
research_complete: bool
The Annotated[List[BaseMessage], add_messages] pattern is LangGraph-specific — it tells the graph to append new messages rather than replace the whole list. Everything else is straightforward typed fields. Notice the search_count field — that’s your circuit breaker. Without it, a confused agent will search forever and drain your API budget.
Step 2: Set Up Claude and Your Tools
from langchain_anthropic import ChatAnthropic
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.tools import tool
# Claude 3.5 Sonnet is the right call here — 3.7 is better at reasoning
# but costs more and the latency shows in multi-step agents
llm = ChatAnthropic(
model="claude-3-5-sonnet-20241022",
temperature=0, # Zero temp for consistent tool use decisions
max_tokens=4096
)
search_tool = TavilySearchResults(max_results=3)
# Bind tools to the model so Claude knows what's available
tools = [search_tool]
llm_with_tools = llm.bind_tools(tools)
Temperature zero for agents is not a religious debate — it’s just correct. You want deterministic tool-use decisions, not creative ones. Save non-zero temperature for the final report generation step if you want more varied prose.
Step 3: Define Your Node Functions
Each node receives the current state and returns a dictionary of state updates. Keep nodes focused on one responsibility.
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.prebuilt import ToolNode
SYSTEM_PROMPT = """You are a research assistant. Your job is to gather information
and write comprehensive reports. When you have enough information (at least 3 good
sources), set research_complete to true. If you need more information, use the
search tool. Be methodical and thorough."""
def research_node(state: ResearchState) -> dict:
"""Main agent node — Claude decides whether to search or conclude."""
messages = state["messages"]
# Add system context if this is the first call
if not any(isinstance(m, SystemMessage) for m in messages):
messages = [SystemMessage(content=SYSTEM_PROMPT)] + messages
response = llm_with_tools.invoke(messages)
return {
"messages": [response],
"search_count": state["search_count"] + (1 if response.tool_calls else 0)
}
def report_writer_node(state: ResearchState) -> dict:
"""Dedicated node for writing the final report — uses slightly higher temp."""
report_llm = ChatAnthropic(
model="claude-3-5-sonnet-20241022",
temperature=0.3,
max_tokens=8192
)
context = "\n\n".join(state["search_results"])
prompt = f"""Based on the following research, write a comprehensive report about: {state['topic']}
Research gathered:
{context}
Write a well-structured report with an executive summary, key findings, and conclusion."""
response = report_llm.invoke([HumanMessage(content=prompt)])
return {
"final_report": response.content,
"messages": [response]
}
# LangGraph's built-in ToolNode handles tool execution automatically
tool_node = ToolNode(tools)
Step 4: Wire Up the Graph
This is where LangGraph’s model clicks. You define nodes, then define edges — including conditional edges that route based on state.
from langgraph.graph import StateGraph, END
from langchain_core.messages import AIMessage
def should_continue(state: ResearchState) -> str:
"""Routing function — decides the next node based on current state."""
messages = state["messages"]
last_message = messages[-1]
# Circuit breaker: max 5 searches
if state["search_count"] >= 5:
return "write_report"
# If the last AI message has tool calls, execute them
if isinstance(last_message, AIMessage) and last_message.tool_calls:
return "tools"
# Otherwise, write the report
return "write_report"
# Build the graph
workflow = StateGraph(ResearchState)
# Add nodes
workflow.add_node("research", research_node)
workflow.add_node("tools", tool_node)
workflow.add_node("write_report", report_writer_node)
# Set entry point
workflow.set_entry_point("research")
# Add edges
workflow.add_conditional_edges(
"research",
should_continue,
{
"tools": "tools",
"write_report": "write_report"
}
)
# After tools run, always go back to research
workflow.add_edge("tools", "research")
workflow.add_edge("write_report", END)
# Compile the graph
app = workflow.compile()
Step 5: Run Your Agent
from langchain_core.messages import HumanMessage
def run_research_agent(topic: str) -> str:
initial_state = {
"messages": [HumanMessage(content=f"Research this topic thoroughly: {topic}")],
"topic": topic,
"search_results": [],
"search_count": 0,
"final_report": "",
"research_complete": False
}
result = app.invoke(initial_state)
return result["final_report"]
# Run it
report = run_research_agent("LangGraph best practices for production agents 2026")
print(report)
That’s a working agent. But let’s talk about what you need to do before this goes anywhere near production.
Adding Persistence with LangGraph Checkpointers
The agent above loses all state when the process ends. For any real use case — chatbots, long-running research tasks, anything with a user — you need persistence. LangGraph has built-in checkpointer support:
from langgraph.checkpoint.sqlite import SqliteSaver
# For development — use PostgresSaver in production
with SqliteSaver.from_conn_string(":memory:") as checkpointer:
app = workflow.compile(checkpointer=checkpointer)
config = {"configurable": {"thread_id": "user-session-123"}}
# First run
result1 = app.invoke(initial_state, config=config)
# Resume the same thread later — state is preserved
followup_state = {
"messages": [HumanMessage(content="Now focus specifically on the deployment section")],
# ... other fields
}
result2 = app.invoke(followup_state, config=config)
Thread IDs are how LangGraph isolates different user sessions. Generate a UUID per user conversation and you’ve got multi-user state management handled.
Debugging with LangSmith
I’ll be direct: debugging LangGraph agents without LangSmith is painful. Add these environment variables and every agent run gets full tracing — you can see every node, every LLM call, every tool invocation, and the exact state at each step:
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-key"
os.environ["LANGCHAIN_PROJECT"] = "research-agent"
LangSmith has a free tier that covers most development usage. When something goes wrong in a multi-step agent (and it will), the trace view is the only sane way to figure out where it broke. This is non-negotiable for anything beyond a weekend project.
If you’re curious about extending your agent’s capabilities with external tools, our roundup of best MCP servers for coding agents covers integrations that plug directly into Claude-based agents.
Deploying Your Agent
Once your agent is working locally, you need somewhere to run it. For most side projects and early-stage products, a simple FastAPI wrapper deployed to a VPS is the right call — not Lambda, not some fancy serverless setup that fights with LangGraph’s stateful nature.
from fastapi import FastAPI
from pydantic import BaseModel
fastapi_app = FastAPI()
class ResearchRequest(BaseModel):
topic: str
thread_id: str
@fastapi_app.post("/research")
async def research_endpoint(request: ResearchRequest):
config = {"configurable": {"thread_id": request.thread_id}}
result = app.invoke(
{"messages": [HumanMessage(content=f"Research: {request.topic}")],
"topic": request.topic,
"search_results": [],
"search_count": 0,
"final_report": "",
"research_complete": False},
config=config
)
return {"report": result["final_report"]}
For hosting, DigitalOcean’s App Platform or a basic Droplet is my go-to for this kind of workload — straightforward pricing, good performance, and the $200 credit for new accounts means you can run your agent for months before paying anything. We’ve also got a deeper comparison of hosting options in our best cloud hosting for side projects guide if you want to evaluate alternatives.
Common Mistakes and How to Avoid Them
1. No circuit breaker on tool calls
Claude is good at tool use but not infallible. Without a hard limit on iterations (I use 5 for most agents), a confused agent will loop until you hit rate limits or your credit card cries. Always add a counter to your state and enforce a max in your routing function.
2. Putting too much logic in the LLM
The routing function (should_continue) should be deterministic Python, not another LLM call. I’ve seen people ask Claude to decide whether to continue searching — that’s adding latency, cost, and unpredictability to something that should be a simple conditional.
3. Ignoring token budgets
Each research loop appends messages to state. After 5 searches, you might have 10,000+ tokens of context. Claude 3.5 Sonnet handles 200k context, so this won’t break things, but it will get expensive fast. Consider summarizing search results before appending them, or use a sliding window on the message history.
4. Not handling tool errors
Tavily goes down. Rate limits happen. Wrap your tool node with error handling and add a retry mechanism. LangGraph’s ToolNode will propagate errors to the agent by default, which is fine, but Claude needs to see a meaningful error message to handle it gracefully.
When to Use Claude 3.7 Instead of 3.5 Sonnet
Claude 3.7 Sonnet is meaningfully better at multi-step reasoning and complex tool orchestration — but it’s slower and costs more. My rule of thumb:
- Use 3.5 Sonnet for agents with clear, well-defined tool schemas and straightforward decision logic. Most research agents, customer support bots, and data extraction pipelines fall here.
- Use 3.7 Sonnet when your agent needs to reason through ambiguous situations, handle edge cases autonomously, or when the cost of a wrong decision is high. Code generation agents and complex planning tasks benefit most.
You can also mix models within the same graph — use 3.5 for the research loop and 3.7 only for the final synthesis step where quality matters most. That’s a legitimate cost optimization strategy.
Full Architecture Summary
| Component | Choice | Why |
|---|---|---|
| Agent Framework | LangGraph | Explicit state, debuggable, production-ready |
| LLM | Claude 3.5 Sonnet | Best tool use + cost balance in 2026 |
| Search Tool | Tavily | Clean structured results, built for LLM agents |
| Persistence | LangGraph + SQLite/Postgres | Built-in checkpointing, thread isolation |
| Observability | LangSmith | Full trace visibility, free tier available |
| Serving | FastAPI | Simple, async-native, easy to deploy |
| Hosting | DigitalOcean | Predictable pricing, good for stateful workloads |
What to Build Next
Once you’ve got this agent running, the natural extensions are:
- Human-in-the-loop: LangGraph’s
interrupt_beforeparameter lets you pause execution and wait for human approval before continuing. Critical for any agent that takes real-world actions. - Parallel tool calls: Claude 3.5+ supports parallel tool calling — you can fan out to multiple searches simultaneously and join the results, cutting multi-search latency by 60-70%.
- Subgraphs: For complex agents, you can nest graphs inside graphs. A planning agent can spawn specialized subgraph agents for different tasks.
- Streaming: Use
app.astream_events()instead ofapp.invoke()to stream tokens to the frontend as they’re generated — essential for any user-facing product.
For a broader look at where this fits in the developer AI ecosystem, our best AI tools for developers roundup covers the full stack beyond just agents.
Final Recommendation
If you’re serious about building AI agents that work reliably in production — not just in demos — the LangGraph + Claude stack is the right choice right now. The graph-based execution model forces you to think clearly about state and control flow, which is exactly the discipline you need when building systems that make autonomous decisions.
Start with the code in this guide, get it running locally, add LangSmith tracing immediately (before you need to debug anything), and deploy to a simple VPS once you’re ready to share it. Don’t over-engineer the infrastructure until you have real users.
The Anthropic API has solid documentation and the langchain-anthropic integration is well-maintained. Claude’s tool use has gotten noticeably more reliable with each release in the 3.x family. This is a stack you can build on.
If you hit walls with hosting or want to understand your deployment options better, DigitalOcean’s $200 credit for new accounts is a genuinely useful way to experiment without commitment — I’ve used it to run agent infrastructure for months while validating ideas before spending real money.
Now stop reading and go build something.
Get the dev tool stack guide
A weekly breakdown of the tools worth your time — and the ones that aren’t. Join 500+ developers.
No spam. Unsubscribe anytime.