How to Build an AI Agent with LangGraph and Claude

This article contains affiliate links. We may earn a commission if you purchase through them — at no extra cost to you.

You’ve read the blog posts about AI agents. You’ve watched the demos. Now you actually want to build one that doesn’t fall apart after two tool calls. That’s exactly what this guide is for.

The LangGraph + Claude stack has become the serious developer’s choice for stateful agents in 2026 — and for good reason. LangGraph gives you explicit control over agent state and execution flow (no more praying your agent doesn’t loop forever), while Claude 3.5 Sonnet and 3.7 Sonnet bring genuinely strong tool-use and instruction-following that makes agent reliability dramatically better than it was two years ago. I’ve built production agents on this stack and I’ll show you exactly how it works, including the parts the official docs gloss over.

What We’re Building

We’re going to build a research assistant agent that can:

Search the web for information on a given topic
Decide whether it has enough information or needs to search again
Summarize findings into a structured report
Maintain state across the entire workflow so nothing gets lost

This is a realistic, non-trivial agent — not a toy “calculator tool” example. By the end, you’ll understand the LangGraph mental model well enough to build your own agents from scratch.

Why LangGraph Over LangChain Agents or AutoGen?

Before we write a line of code, let’s be honest about the tradeoffs. LangChain’s older AgentExecutor was a black box — great for demos, terrible when you needed to debug why your agent called the wrong tool six times in a row. AutoGen is powerful but its multi-agent orchestration model is overkill for most single-agent use cases and the debugging story is rough.

LangGraph’s core insight is treating agent execution as a directed graph of nodes and edges. Each node is a function. Each edge is a transition. You can add conditional logic, loops, and human-in-the-loop checkpoints explicitly. You can see exactly what state looks like at every step. For production agents, this isn’t optional — it’s the difference between a system you can actually maintain and one you throw away after three months.

If you’re comparing Claude to other models for this kind of work, check out our Claude vs ChatGPT for Developers review — the short version is Claude’s tool use and long-context handling make it the better choice for agentic workflows right now.

Prerequisites

Python 3.11+
An Anthropic API key (Claude 3.5 Sonnet is the sweet spot for cost/performance)
Basic familiarity with Python and async concepts
pip install langgraph langchain-anthropic tavily-python

We’re using Tavily for web search — it’s purpose-built for LLM agents and returns clean, structured results. You’ll need a free API key from them too.

Get the dev tool stack guide

A weekly breakdown of the tools worth your time — and the ones that aren’t. Join 500+ developers.

No spam. Unsubscribe anytime.

Step 1: Define Your Agent State

This is the part most tutorials skip, and it’s the most important design decision you’ll make. State in LangGraph is a typed dictionary that persists across every node in your graph. Get this wrong and you’ll be refactoring everything later.

from typing import TypedDict, Annotated, List
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage

class ResearchState(TypedDict):
    # The conversation/message history
    messages: Annotated[List[BaseMessage], add_messages]
    # The research topic
    topic: str
    # Search results accumulated across iterations
    search_results: List[str]
    # How many searches we've done (circuit breaker)
    search_count: int
    # Final report output
    final_report: str
    # Whether we have enough info to write the report
    research_complete: bool

The Annotated[List[BaseMessage], add_messages] pattern is LangGraph-specific — it tells the graph to append new messages rather than replace the whole list. Everything else is straightforward typed fields. Notice the search_count field — that’s your circuit breaker. Without it, a confused agent will search forever and drain your API budget.

Step 2: Set Up Claude and Your Tools

from langchain_anthropic import ChatAnthropic
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.tools import tool

# Claude 3.5 Sonnet is the right call here — 3.7 is better at reasoning
# but costs more and the latency shows in multi-step agents
llm = ChatAnthropic(
    model="claude-3-5-sonnet-20241022",
    temperature=0,  # Zero temp for consistent tool use decisions
    max_tokens=4096
)

search_tool = TavilySearchResults(max_results=3)

# Bind tools to the model so Claude knows what's available
tools = [search_tool]
llm_with_tools = llm.bind_tools(tools)

Temperature zero for agents is not a religious debate — it’s just correct. You want deterministic tool-use decisions, not creative ones. Save non-zero temperature for the final report generation step if you want more varied prose.

Step 3: Define Your Node Functions

Each node receives the current state and returns a dictionary of state updates. Keep nodes focused on one responsibility.

from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.prebuilt import ToolNode

SYSTEM_PROMPT = """You are a research assistant. Your job is to gather information 
and write comprehensive reports. When you have enough information (at least 3 good 
sources), set research_complete to true. If you need more information, use the 
search tool. Be methodical and thorough."""

def research_node(state: ResearchState) -> dict:
    """Main agent node — Claude decides whether to search or conclude."""
    messages = state["messages"]
    
    # Add system context if this is the first call
    if not any(isinstance(m, SystemMessage) for m in messages):
        messages = [SystemMessage(content=SYSTEM_PROMPT)] + messages
    
    response = llm_with_tools.invoke(messages)
    
    return {
        "messages": [response],
        "search_count": state["search_count"] + (1 if response.tool_calls else 0)
    }

def report_writer_node(state: ResearchState) -> dict:
    """Dedicated node for writing the final report — uses slightly higher temp."""
    report_llm = ChatAnthropic(
        model="claude-3-5-sonnet-20241022",
        temperature=0.3,
        max_tokens=8192
    )
    
    context = "\n\n".join(state["search_results"])
    
    prompt = f"""Based on the following research, write a comprehensive report about: {state['topic']}
    
Research gathered:
{context}

Write a well-structured report with an executive summary, key findings, and conclusion."""
    
    response = report_llm.invoke([HumanMessage(content=prompt)])
    
    return {
        "final_report": response.content,
        "messages": [response]
    }

# LangGraph's built-in ToolNode handles tool execution automatically
tool_node = ToolNode(tools)

Step 4: Wire Up the Graph

This is where LangGraph’s model clicks. You define nodes, then define edges — including conditional edges that route based on state.

from langgraph.graph import StateGraph, END
from langchain_core.messages import AIMessage

def should_continue(state: ResearchState) -> str:
    """Routing function — decides the next node based on current state."""
    messages = state["messages"]
    last_message = messages[-1]
    
    # Circuit breaker: max 5 searches
    if state["search_count"] >= 5:
        return "write_report"
    
    # If the last AI message has tool calls, execute them
    if isinstance(last_message, AIMessage) and last_message.tool_calls:
        return "tools"
    
    # Otherwise, write the report
    return "write_report"

# Build the graph
workflow = StateGraph(ResearchState)

# Add nodes
workflow.add_node("research", research_node)
workflow.add_node("tools", tool_node)
workflow.add_node("write_report", report_writer_node)

# Set entry point
workflow.set_entry_point("research")

# Add edges
workflow.add_conditional_edges(
    "research",
    should_continue,
    {
        "tools": "tools",
        "write_report": "write_report"
    }
)

# After tools run, always go back to research
workflow.add_edge("tools", "research")
workflow.add_edge("write_report", END)

# Compile the graph
app = workflow.compile()

Step 5: Run Your Agent

from langchain_core.messages import HumanMessage

def run_research_agent(topic: str) -> str:
    initial_state = {
        "messages": [HumanMessage(content=f"Research this topic thoroughly: {topic}")],
        "topic": topic,
        "search_results": [],
        "search_count": 0,
        "final_report": "",
        "research_complete": False
    }
    
    result = app.invoke(initial_state)
    return result["final_report"]

# Run it
report = run_research_agent("LangGraph best practices for production agents 2026")
print(report)

That’s a working agent. But let’s talk about what you need to do before this goes anywhere near production.

Adding Persistence with LangGraph Checkpointers

The agent above loses all state when the process ends. For any real use case — chatbots, long-running research tasks, anything with a user — you need persistence. LangGraph has built-in checkpointer support:

from langgraph.checkpoint.sqlite import SqliteSaver

# For development — use PostgresSaver in production
with SqliteSaver.from_conn_string(":memory:") as checkpointer:
    app = workflow.compile(checkpointer=checkpointer)
    
    config = {"configurable": {"thread_id": "user-session-123"}}
    
    # First run
    result1 = app.invoke(initial_state, config=config)
    
    # Resume the same thread later — state is preserved
    followup_state = {
        "messages": [HumanMessage(content="Now focus specifically on the deployment section")],
        # ... other fields
    }
    result2 = app.invoke(followup_state, config=config)

Thread IDs are how LangGraph isolates different user sessions. Generate a UUID per user conversation and you’ve got multi-user state management handled.

Debugging with LangSmith

I’ll be direct: debugging LangGraph agents without LangSmith is painful. Add these environment variables and every agent run gets full tracing — you can see every node, every LLM call, every tool invocation, and the exact state at each step:

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-key"
os.environ["LANGCHAIN_PROJECT"] = "research-agent"

LangSmith has a free tier that covers most development usage. When something goes wrong in a multi-step agent (and it will), the trace view is the only sane way to figure out where it broke. This is non-negotiable for anything beyond a weekend project.

If you’re curious about extending your agent’s capabilities with external tools, our roundup of best MCP servers for coding agents covers integrations that plug directly into Claude-based agents.

Deploying Your Agent

Once your agent is working locally, you need somewhere to run it. For most side projects and early-stage products, a simple FastAPI wrapper deployed to a VPS is the right call — not Lambda, not some fancy serverless setup that fights with LangGraph’s stateful nature.

from fastapi import FastAPI
from pydantic import BaseModel

fastapi_app = FastAPI()

class ResearchRequest(BaseModel):
    topic: str
    thread_id: str

@fastapi_app.post("/research")
async def research_endpoint(request: ResearchRequest):
    config = {"configurable": {"thread_id": request.thread_id}}
    result = app.invoke(
        {"messages": [HumanMessage(content=f"Research: {request.topic}")],
         "topic": request.topic,
         "search_results": [],
         "search_count": 0,
         "final_report": "",
         "research_complete": False},
        config=config
    )
    return {"report": result["final_report"]}

For hosting, DigitalOcean’s App Platform or a basic Droplet is my go-to for this kind of workload — straightforward pricing, good performance, and the $200 credit for new accounts means you can run your agent for months before paying anything. We’ve also got a deeper comparison of hosting options in our best cloud hosting for side projects guide if you want to evaluate alternatives.

Common Mistakes and How to Avoid Them

1. No circuit breaker on tool calls

Claude is good at tool use but not infallible. Without a hard limit on iterations (I use 5 for most agents), a confused agent will loop until you hit rate limits or your credit card cries. Always add a counter to your state and enforce a max in your routing function.

2. Putting too much logic in the LLM

The routing function (should_continue) should be deterministic Python, not another LLM call. I’ve seen people ask Claude to decide whether to continue searching — that’s adding latency, cost, and unpredictability to something that should be a simple conditional.

3. Ignoring token budgets

Each research loop appends messages to state. After 5 searches, you might have 10,000+ tokens of context. Claude 3.5 Sonnet handles 200k context, so this won’t break things, but it will get expensive fast. Consider summarizing search results before appending them, or use a sliding window on the message history.

4. Not handling tool errors

Tavily goes down. Rate limits happen. Wrap your tool node with error handling and add a retry mechanism. LangGraph’s ToolNode will propagate errors to the agent by default, which is fine, but Claude needs to see a meaningful error message to handle it gracefully.

When to Use Claude 3.7 Instead of 3.5 Sonnet

Claude 3.7 Sonnet is meaningfully better at multi-step reasoning and complex tool orchestration — but it’s slower and costs more. My rule of thumb:

Use 3.5 Sonnet for agents with clear, well-defined tool schemas and straightforward decision logic. Most research agents, customer support bots, and data extraction pipelines fall here.
Use 3.7 Sonnet when your agent needs to reason through ambiguous situations, handle edge cases autonomously, or when the cost of a wrong decision is high. Code generation agents and complex planning tasks benefit most.

You can also mix models within the same graph — use 3.5 for the research loop and 3.7 only for the final synthesis step where quality matters most. That’s a legitimate cost optimization strategy.

Full Architecture Summary

Component	Choice	Why
Agent Framework	LangGraph	Explicit state, debuggable, production-ready
LLM	Claude 3.5 Sonnet	Best tool use + cost balance in 2026
Search Tool	Tavily	Clean structured results, built for LLM agents
Persistence	LangGraph + SQLite/Postgres	Built-in checkpointing, thread isolation
Observability	LangSmith	Full trace visibility, free tier available
Serving	FastAPI	Simple, async-native, easy to deploy
Hosting	DigitalOcean	Predictable pricing, good for stateful workloads

What to Build Next

Once you’ve got this agent running, the natural extensions are:

Human-in-the-loop: LangGraph’s interrupt_before parameter lets you pause execution and wait for human approval before continuing. Critical for any agent that takes real-world actions.
Parallel tool calls: Claude 3.5+ supports parallel tool calling — you can fan out to multiple searches simultaneously and join the results, cutting multi-search latency by 60-70%.
Subgraphs: For complex agents, you can nest graphs inside graphs. A planning agent can spawn specialized subgraph agents for different tasks.
Streaming: Use app.astream_events() instead of app.invoke() to stream tokens to the frontend as they’re generated — essential for any user-facing product.

For a broader look at where this fits in the developer AI ecosystem, our best AI tools for developers roundup covers the full stack beyond just agents.

Final Recommendation

If you’re serious about building AI agents that work reliably in production — not just in demos — the LangGraph + Claude stack is the right choice right now. The graph-based execution model forces you to think clearly about state and control flow, which is exactly the discipline you need when building systems that make autonomous decisions.

Start with the code in this guide, get it running locally, add LangSmith tracing immediately (before you need to debug anything), and deploy to a simple VPS once you’re ready to share it. Don’t over-engineer the infrastructure until you have real users.

The Anthropic API has solid documentation and the langchain-anthropic integration is well-maintained. Claude’s tool use has gotten noticeably more reliable with each release in the 3.x family. This is a stack you can build on.

If you hit walls with hosting or want to understand your deployment options better, DigitalOcean’s $200 credit for new accounts is a genuinely useful way to experiment without commitment — I’ve used it to run agent infrastructure for months while validating ideas before spending real money.

Now stop reading and go build something.

Get the dev tool stack guide

A weekly breakdown of the tools worth your time — and the ones that aren’t. Join 500+ developers.

No spam. Unsubscribe anytime.

What We’re Building

Why LangGraph Over LangChain Agents or AutoGen?

Prerequisites

Get the dev tool stack guide

Step 1: Define Your Agent State

Step 2: Set Up Claude and Your Tools

Step 3: Define Your Node Functions

Step 4: Wire Up the Graph

Step 5: Run Your Agent

Adding Persistence with LangGraph Checkpointers

Debugging with LangSmith

Deploying Your Agent

Common Mistakes and How to Avoid Them

1. No circuit breaker on tool calls

2. Putting too much logic in the LLM

3. Ignoring token budgets

4. Not handling tool errors

When to Use Claude 3.7 Instead of 3.5 Sonnet

Full Architecture Summary

What to Build Next

Final Recommendation

Get the dev tool stack guide

Leave a Comment Cancel reply

Stay sharp.

Before you go...