This article contains affiliate links. We may earn a commission if you sign up through our links — at no extra cost to you.
You’ve read the LangChain docs. You’ve seen the demo videos. Now you actually want to build something that works — a real AI agent that maintains state, makes decisions, calls tools, and doesn’t hallucinate its way into a support ticket. This guide covers exactly how to build an AI agent with LangGraph and Claude, end to end, with working code and honest opinions about where things break.
Why this stack specifically? LangGraph handles the hard part of stateful agent orchestration — the loops, the branching, the memory between steps. Claude (especially Claude 3.5 Sonnet and 3.7 Sonnet) is currently the best model for agents that need to reason carefully before acting. It follows tool call schemas reliably, handles long context without losing the thread, and is noticeably less likely to go rogue than some alternatives. I’ve built agents on GPT-4o, Gemini, and Claude — Claude wins for agentic workloads right now. (For a deeper comparison, see Claude vs ChatGPT for Developers.)
What You’re Building
A research assistant agent that can:
- Accept a user question
- Search the web (via a tool)
- Decide whether it has enough information or needs to search again
- Return a grounded, cited answer
This is the canonical “ReAct” loop — Reason, Act, Observe, repeat. It’s simple enough to understand fully but complex enough that you’ll hit every real pain point: state management, tool errors, infinite loops, and token costs.
Prerequisites
- Python 3.11+
- An Anthropic API key (get one at console.anthropic.com)
- Basic familiarity with Python async and type hints
pip install langgraph langchain-anthropic tavily-python
For web search, I’m using Tavily — it’s purpose-built for LLM agents and returns clean structured results. A free tier exists. You could swap in SerpAPI or a custom scraper, but Tavily is the path of least resistance here.
Understanding LangGraph’s Core Concepts (Quickly)
LangGraph models your agent as a directed graph. Here’s what that means in practice:
- State: A typed dictionary that persists across every node in the graph. Think of it as the agent’s working memory.
- Nodes: Functions that take state, do something (call an LLM, run a tool, transform data), and return an updated state.
- Edges: Connections between nodes. They can be static (always go from A to B) or conditional (go to B or C depending on state).
- Checkpointing: LangGraph can persist state to a database between runs. This is what enables long-running agents, human-in-the-loop, and resumable workflows.
The mental model that clicked for me: LangGraph is a state machine where the LLM gets to influence which transition fires next. That’s it. Everything else is plumbing.
Get the dev tool stack guide
A weekly breakdown of the tools worth your time — and the ones that aren’t. Join 500+ developers.
No spam. Unsubscribe anytime.
Step 1: Define Your Agent State
from typing import TypedDict, Annotated, List
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage
class AgentState(TypedDict):
messages: Annotated[List[BaseMessage], add_messages]
search_count: int
final_answer: str | None
The add_messages annotation is important — it tells LangGraph to append new messages rather than overwrite the list. search_count is our circuit breaker; we’ll use it to prevent infinite search loops. final_answer will be set when the agent decides it’s done.
Step 2: Set Up Claude and Your Tools
import os
from langchain_anthropic import ChatAnthropic
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.tools import tool
# Use Claude 3.5 Sonnet for the best cost/performance ratio on agents
llm = ChatAnthropic(
model="claude-3-5-sonnet-20241022",
temperature=0, # Deterministic for tool calls
api_key=os.environ["ANTHROPIC_API_KEY"]
)
search_tool = TavilySearchResults(
max_results=3,
api_key=os.environ["TAVILY_API_KEY"]
)
tools = [search_tool]
llm_with_tools = llm.bind_tools(tools)
Set temperature=0 for agents. I cannot stress this enough. You want deterministic tool call decisions, not creative ones. Save the temperature for the final answer generation step if you need it.
Why Claude 3.5 Sonnet over 3.7? For most agent workloads, 3.5 Sonnet is faster and cheaper while matching 3.7’s tool-calling accuracy. Use 3.7 Sonnet when your agent needs extended thinking — complex multi-step reasoning where you want it to slow down before acting. For a straightforward research loop, 3.5 Sonnet is the right call.
Step 3: Build the Graph Nodes
from langchain_core.messages import AIMessage, ToolMessage, HumanMessage
import json
def call_model(state: AgentState) -> AgentState:
"""The reasoning node — Claude decides what to do next."""
response = llm_with_tools.invoke(state["messages"])
return {"messages": [response]}
def run_tools(state: AgentState) -> AgentState:
"""Execute whatever tool calls Claude requested."""
last_message = state["messages"][-1]
tool_results = []
for tool_call in last_message.tool_calls:
if tool_call["name"] == "tavily_search_results_json":
result = search_tool.invoke(tool_call["args"])
tool_results.append(
ToolMessage(
content=json.dumps(result),
tool_call_id=tool_call["id"]
)
)
return {
"messages": tool_results,
"search_count": state["search_count"] + 1
}
Step 4: Add Conditional Routing
This is where LangGraph earns its keep. We need logic that says: “Did Claude call a tool? Route to tool execution. Did Claude give a final answer? We’re done. Have we searched too many times? Force a stop.”
from typing import Literal
def should_continue(state: AgentState) -> Literal["tools", "end"]:
last_message = state["messages"][-1]
# Hard stop after 5 searches — adjust based on your use case
if state["search_count"] >= 5:
return "end"
# If Claude made tool calls, execute them
if hasattr(last_message, "tool_calls") and last_message.tool_calls:
return "tools"
# No tool calls = Claude is done reasoning
return "end"
That search_count >= 5 guard is not optional. Without it, you will eventually hit an agent that loops forever because it keeps deciding it needs more information. It happens. Put the guard in.
Step 5: Assemble the Graph
from langgraph.graph import StateGraph, END
def build_agent():
graph = StateGraph(AgentState)
# Add nodes
graph.add_node("agent", call_model)
graph.add_node("tools", run_tools)
# Entry point
graph.set_entry_point("agent")
# Conditional routing from the agent node
graph.add_conditional_edges(
"agent",
should_continue,
{
"tools": "tools",
"end": END
}
)
# After tool execution, always go back to the agent
graph.add_edge("tools", "agent")
return graph.compile()
agent = build_agent()
Step 6: Run It
from langchain_core.messages import HumanMessage
def run_agent(question: str) -> str:
initial_state = {
"messages": [HumanMessage(content=question)],
"search_count": 0,
"final_answer": None
}
result = agent.invoke(initial_state)
# The last message is Claude's final response
return result["messages"][-1].content
# Try it
answer = run_agent(
"What are the main differences between LangGraph and AutoGen for building AI agents?"
)
print(answer)
If everything is wired up correctly, you’ll see Claude search, read the results, possibly search again, and return a grounded answer with sources. The whole round trip typically takes 5-15 seconds depending on search latency.
Adding Persistence (The Feature That Actually Matters)
A stateless agent is a toy. Real agents need to remember context across sessions. LangGraph’s checkpointing makes this straightforward:
from langgraph.checkpoint.sqlite import SqliteSaver
# For production, swap SqliteSaver for PostgresSaver
with SqliteSaver.from_conn_string("agent_memory.db") as checkpointer:
agent = build_agent_with_checkpointer(checkpointer)
# thread_id groups messages into a conversation
config = {"configurable": {"thread_id": "user-123-session-1"}}
# First message
result1 = agent.invoke(
{"messages": [HumanMessage(content="What is LangGraph?")],
"search_count": 0, "final_answer": None},
config=config
)
# Follow-up — the agent remembers the previous exchange
result2 = agent.invoke(
{"messages": [HumanMessage(content="How does it compare to plain LangChain?")],
"search_count": 0, "final_answer": None},
config=config
)
For production, use PostgresSaver or AsyncSqliteSaver. SQLite works fine for local development but will bottleneck under concurrent users. If you’re deploying this as a service, you’ll want a proper VPS — DigitalOcean’s managed Postgres pairs well here and their $200 credit for new accounts covers a solid amount of experimentation.
Debugging With LangSmith
Here’s an honest take: debugging LangGraph agents without tracing is miserable. You’re staring at a stream of messages trying to figure out why the agent made a wrong turn on step 3 of 7. LangSmith fixes this.
Add these environment variables and you get full trace visualization in the LangSmith UI — every node, every LLM call, every tool result, latency, and token count:
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=your_langsmith_key
export LANGCHAIN_PROJECT=my-research-agent
That’s it. No code changes. LangSmith’s free tier covers 5,000 traces/month, which is plenty for development. Once you’re in production and need to debug a specific user complaint, you’ll search by thread ID and see exactly what happened. This is not optional infrastructure — it’s how you actually maintain agents.
Common Failures and How to Fix Them
The Agent Loops Forever
You forgot the circuit breaker. Add search_count tracking and a hard stop. Also check whether your tool is returning empty results — Claude sometimes keeps searching when it gets nothing back.
Tool Call Schema Errors
Claude is strict about tool schemas. If you’re getting validation errors, print llm_with_tools.kwargs["tools"] and verify the JSON schema matches what you’re passing. The most common mistake is using Python types directly instead of JSON Schema types.
Context Window Overflows
Long agent runs accumulate a lot of messages. You’ll hit Claude’s context limit (200k tokens for 3.5/3.7 Sonnet, but cost increases linearly). Add a message trimming step that keeps the last N messages plus the original system prompt. LangGraph has a built-in trim_messages utility for this.
Slow Performance
If your agent is taking 30+ seconds per run, the bottleneck is almost always tool latency, not the LLM. Profile your tool calls. Tavily is usually fast; custom scrapers are usually slow. Consider running tool calls in parallel when the agent requests multiple tools in one step.
Deploying Your Agent
For a simple HTTP wrapper, FastAPI works well:
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class QueryRequest(BaseModel):
question: str
thread_id: str = "default"
@app.post("/agent/query")
async def query_agent(request: QueryRequest):
answer = run_agent(request.question)
return {"answer": answer, "thread_id": request.thread_id}
For hosting, you have options. LangGraph Cloud (Anthropic’s managed platform) handles scaling and persistence for you — worth it if you’re building a product. For side projects or internal tools, a small VPS is cheaper. I’ve had good results with DigitalOcean droplets for this kind of workload — a $12/month droplet handles a surprising amount of traffic for a single-agent service. See our best cloud hosting for side projects breakdown for a fuller comparison.
If you want to integrate your agent with MCP servers to expand its tool access (file systems, databases, APIs), check out our guide to best MCP servers for coding agents — it pairs naturally with this setup.
When to Use This Stack vs. Alternatives
| Use Case | LangGraph + Claude | Alternative |
|---|---|---|
| Stateful multi-step agents | ✅ Excellent | AutoGen (more complex setup) |
| Human-in-the-loop workflows | ✅ First-class support | CrewAI (less granular control) |
| Simple one-shot Q&A | ⚠️ Overkill | Direct Claude API call |
| Multi-agent coordination | ✅ Supported via subgraphs | AutoGen (purpose-built for this) |
| Production reliability | ✅ Checkpointing + LangSmith | Most alternatives lack this |
| Rapid prototyping | ⚠️ Some boilerplate | Pydantic AI (less setup) |
Final Recommendation
LangGraph + Claude is the right stack for production agents in 2026. It’s not the fastest to prototype with — you’ll write more boilerplate than with simpler frameworks — but that boilerplate buys you explicit state management, proper error handling, and observability. Those aren’t nice-to-haves; they’re what separates a demo from something you can actually maintain.
Start with the exact pattern above. Get it working. Then add complexity: more tools, subgraphs for specialized tasks, human-in-the-loop interrupts for high-stakes decisions. LangGraph scales with you in a way that “magic” frameworks don’t.
The one thing I’d change if starting fresh: instrument with LangSmith from day one, not as an afterthought. You’ll thank yourself the first time an agent does something unexpected in production and you can actually see why.
For more on how Claude stacks up as your agent’s brain, read our detailed Claude vs ChatGPT developer review. And if you’re evaluating the broader AI tooling ecosystem, our best AI tools for developers roundup covers what’s actually worth your time in 2026.
Get the dev tool stack guide
A weekly breakdown of the tools worth your time — and the ones that aren’t. Join 500+ developers.
No spam. Unsubscribe anytime.