How to Build an AI Agent with LangGraph and Claude

This article contains affiliate links. We may earn a commission if you sign up through our links — at no extra cost to you.

You’ve read the LangChain docs. You’ve seen the demo videos. Now you actually want to build something that works — a real AI agent that maintains state, makes decisions, calls tools, and doesn’t hallucinate its way into a support ticket. This guide covers exactly how to build an AI agent with LangGraph and Claude, end to end, with working code and honest opinions about where things break.

Why this stack specifically? LangGraph handles the hard part of stateful agent orchestration — the loops, the branching, the memory between steps. Claude (especially Claude 3.5 Sonnet and 3.7 Sonnet) is currently the best model for agents that need to reason carefully before acting. It follows tool call schemas reliably, handles long context without losing the thread, and is noticeably less likely to go rogue than some alternatives. I’ve built agents on GPT-4o, Gemini, and Claude — Claude wins for agentic workloads right now. (For a deeper comparison, see Claude vs ChatGPT for Developers.)

What You’re Building

A research assistant agent that can:

  • Accept a user question
  • Search the web (via a tool)
  • Decide whether it has enough information or needs to search again
  • Return a grounded, cited answer

This is the canonical “ReAct” loop — Reason, Act, Observe, repeat. It’s simple enough to understand fully but complex enough that you’ll hit every real pain point: state management, tool errors, infinite loops, and token costs.

Prerequisites

  • Python 3.11+
  • An Anthropic API key (get one at console.anthropic.com)
  • Basic familiarity with Python async and type hints
  • pip install langgraph langchain-anthropic tavily-python

For web search, I’m using Tavily — it’s purpose-built for LLM agents and returns clean structured results. A free tier exists. You could swap in SerpAPI or a custom scraper, but Tavily is the path of least resistance here.

Understanding LangGraph’s Core Concepts (Quickly)

LangGraph models your agent as a directed graph. Here’s what that means in practice:

  • State: A typed dictionary that persists across every node in the graph. Think of it as the agent’s working memory.
  • Nodes: Functions that take state, do something (call an LLM, run a tool, transform data), and return an updated state.
  • Edges: Connections between nodes. They can be static (always go from A to B) or conditional (go to B or C depending on state).
  • Checkpointing: LangGraph can persist state to a database between runs. This is what enables long-running agents, human-in-the-loop, and resumable workflows.

The mental model that clicked for me: LangGraph is a state machine where the LLM gets to influence which transition fires next. That’s it. Everything else is plumbing.

Get the dev tool stack guide

A weekly breakdown of the tools worth your time — and the ones that aren’t. Join 500+ developers.



No spam. Unsubscribe anytime.

Step 1: Define Your Agent State

from typing import TypedDict, Annotated, List
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
    messages: Annotated[List[BaseMessage], add_messages]
    search_count: int
    final_answer: str | None

The add_messages annotation is important — it tells LangGraph to append new messages rather than overwrite the list. search_count is our circuit breaker; we’ll use it to prevent infinite search loops. final_answer will be set when the agent decides it’s done.

Step 2: Set Up Claude and Your Tools

import os
from langchain_anthropic import ChatAnthropic
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.tools import tool

# Use Claude 3.5 Sonnet for the best cost/performance ratio on agents
llm = ChatAnthropic(
    model="claude-3-5-sonnet-20241022",
    temperature=0,  # Deterministic for tool calls
    api_key=os.environ["ANTHROPIC_API_KEY"]
)

search_tool = TavilySearchResults(
    max_results=3,
    api_key=os.environ["TAVILY_API_KEY"]
)

tools = [search_tool]
llm_with_tools = llm.bind_tools(tools)

Set temperature=0 for agents. I cannot stress this enough. You want deterministic tool call decisions, not creative ones. Save the temperature for the final answer generation step if you need it.

Why Claude 3.5 Sonnet over 3.7? For most agent workloads, 3.5 Sonnet is faster and cheaper while matching 3.7’s tool-calling accuracy. Use 3.7 Sonnet when your agent needs extended thinking — complex multi-step reasoning where you want it to slow down before acting. For a straightforward research loop, 3.5 Sonnet is the right call.

Step 3: Build the Graph Nodes

from langchain_core.messages import AIMessage, ToolMessage, HumanMessage
import json

def call_model(state: AgentState) -> AgentState:
    """The reasoning node — Claude decides what to do next."""
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}

def run_tools(state: AgentState) -> AgentState:
    """Execute whatever tool calls Claude requested."""
    last_message = state["messages"][-1]
    tool_results = []
    
    for tool_call in last_message.tool_calls:
        if tool_call["name"] == "tavily_search_results_json":
            result = search_tool.invoke(tool_call["args"])
            tool_results.append(
                ToolMessage(
                    content=json.dumps(result),
                    tool_call_id=tool_call["id"]
                )
            )
    
    return {
        "messages": tool_results,
        "search_count": state["search_count"] + 1
    }

Step 4: Add Conditional Routing

This is where LangGraph earns its keep. We need logic that says: “Did Claude call a tool? Route to tool execution. Did Claude give a final answer? We’re done. Have we searched too many times? Force a stop.”

from typing import Literal

def should_continue(state: AgentState) -> Literal["tools", "end"]:
    last_message = state["messages"][-1]
    
    # Hard stop after 5 searches — adjust based on your use case
    if state["search_count"] >= 5:
        return "end"
    
    # If Claude made tool calls, execute them
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    
    # No tool calls = Claude is done reasoning
    return "end"

That search_count >= 5 guard is not optional. Without it, you will eventually hit an agent that loops forever because it keeps deciding it needs more information. It happens. Put the guard in.

Step 5: Assemble the Graph

from langgraph.graph import StateGraph, END

def build_agent():
    graph = StateGraph(AgentState)
    
    # Add nodes
    graph.add_node("agent", call_model)
    graph.add_node("tools", run_tools)
    
    # Entry point
    graph.set_entry_point("agent")
    
    # Conditional routing from the agent node
    graph.add_conditional_edges(
        "agent",
        should_continue,
        {
            "tools": "tools",
            "end": END
        }
    )
    
    # After tool execution, always go back to the agent
    graph.add_edge("tools", "agent")
    
    return graph.compile()

agent = build_agent()

Step 6: Run It

from langchain_core.messages import HumanMessage

def run_agent(question: str) -> str:
    initial_state = {
        "messages": [HumanMessage(content=question)],
        "search_count": 0,
        "final_answer": None
    }
    
    result = agent.invoke(initial_state)
    
    # The last message is Claude's final response
    return result["messages"][-1].content

# Try it
answer = run_agent(
    "What are the main differences between LangGraph and AutoGen for building AI agents?"
)
print(answer)

If everything is wired up correctly, you’ll see Claude search, read the results, possibly search again, and return a grounded answer with sources. The whole round trip typically takes 5-15 seconds depending on search latency.

Adding Persistence (The Feature That Actually Matters)

A stateless agent is a toy. Real agents need to remember context across sessions. LangGraph’s checkpointing makes this straightforward:

from langgraph.checkpoint.sqlite import SqliteSaver

# For production, swap SqliteSaver for PostgresSaver
with SqliteSaver.from_conn_string("agent_memory.db") as checkpointer:
    agent = build_agent_with_checkpointer(checkpointer)
    
    # thread_id groups messages into a conversation
    config = {"configurable": {"thread_id": "user-123-session-1"}}
    
    # First message
    result1 = agent.invoke(
        {"messages": [HumanMessage(content="What is LangGraph?")], 
         "search_count": 0, "final_answer": None},
        config=config
    )
    
    # Follow-up — the agent remembers the previous exchange
    result2 = agent.invoke(
        {"messages": [HumanMessage(content="How does it compare to plain LangChain?")],
         "search_count": 0, "final_answer": None},
        config=config
    )

For production, use PostgresSaver or AsyncSqliteSaver. SQLite works fine for local development but will bottleneck under concurrent users. If you’re deploying this as a service, you’ll want a proper VPS — DigitalOcean’s managed Postgres pairs well here and their $200 credit for new accounts covers a solid amount of experimentation.

Debugging With LangSmith

Here’s an honest take: debugging LangGraph agents without tracing is miserable. You’re staring at a stream of messages trying to figure out why the agent made a wrong turn on step 3 of 7. LangSmith fixes this.

Add these environment variables and you get full trace visualization in the LangSmith UI — every node, every LLM call, every tool result, latency, and token count:

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=your_langsmith_key
export LANGCHAIN_PROJECT=my-research-agent

That’s it. No code changes. LangSmith’s free tier covers 5,000 traces/month, which is plenty for development. Once you’re in production and need to debug a specific user complaint, you’ll search by thread ID and see exactly what happened. This is not optional infrastructure — it’s how you actually maintain agents.

Common Failures and How to Fix Them

The Agent Loops Forever

You forgot the circuit breaker. Add search_count tracking and a hard stop. Also check whether your tool is returning empty results — Claude sometimes keeps searching when it gets nothing back.

Tool Call Schema Errors

Claude is strict about tool schemas. If you’re getting validation errors, print llm_with_tools.kwargs["tools"] and verify the JSON schema matches what you’re passing. The most common mistake is using Python types directly instead of JSON Schema types.

Context Window Overflows

Long agent runs accumulate a lot of messages. You’ll hit Claude’s context limit (200k tokens for 3.5/3.7 Sonnet, but cost increases linearly). Add a message trimming step that keeps the last N messages plus the original system prompt. LangGraph has a built-in trim_messages utility for this.

Slow Performance

If your agent is taking 30+ seconds per run, the bottleneck is almost always tool latency, not the LLM. Profile your tool calls. Tavily is usually fast; custom scrapers are usually slow. Consider running tool calls in parallel when the agent requests multiple tools in one step.

Deploying Your Agent

For a simple HTTP wrapper, FastAPI works well:

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class QueryRequest(BaseModel):
    question: str
    thread_id: str = "default"

@app.post("/agent/query")
async def query_agent(request: QueryRequest):
    answer = run_agent(request.question)
    return {"answer": answer, "thread_id": request.thread_id}

For hosting, you have options. LangGraph Cloud (Anthropic’s managed platform) handles scaling and persistence for you — worth it if you’re building a product. For side projects or internal tools, a small VPS is cheaper. I’ve had good results with DigitalOcean droplets for this kind of workload — a $12/month droplet handles a surprising amount of traffic for a single-agent service. See our best cloud hosting for side projects breakdown for a fuller comparison.

If you want to integrate your agent with MCP servers to expand its tool access (file systems, databases, APIs), check out our guide to best MCP servers for coding agents — it pairs naturally with this setup.

When to Use This Stack vs. Alternatives

Use Case LangGraph + Claude Alternative
Stateful multi-step agents ✅ Excellent AutoGen (more complex setup)
Human-in-the-loop workflows ✅ First-class support CrewAI (less granular control)
Simple one-shot Q&A ⚠️ Overkill Direct Claude API call
Multi-agent coordination ✅ Supported via subgraphs AutoGen (purpose-built for this)
Production reliability ✅ Checkpointing + LangSmith Most alternatives lack this
Rapid prototyping ⚠️ Some boilerplate Pydantic AI (less setup)

Final Recommendation

LangGraph + Claude is the right stack for production agents in 2026. It’s not the fastest to prototype with — you’ll write more boilerplate than with simpler frameworks — but that boilerplate buys you explicit state management, proper error handling, and observability. Those aren’t nice-to-haves; they’re what separates a demo from something you can actually maintain.

Start with the exact pattern above. Get it working. Then add complexity: more tools, subgraphs for specialized tasks, human-in-the-loop interrupts for high-stakes decisions. LangGraph scales with you in a way that “magic” frameworks don’t.

The one thing I’d change if starting fresh: instrument with LangSmith from day one, not as an afterthought. You’ll thank yourself the first time an agent does something unexpected in production and you can actually see why.

For more on how Claude stacks up as your agent’s brain, read our detailed Claude vs ChatGPT developer review. And if you’re evaluating the broader AI tooling ecosystem, our best AI tools for developers roundup covers what’s actually worth your time in 2026.

Get the dev tool stack guide

A weekly breakdown of the tools worth your time — and the ones that aren’t. Join 500+ developers.



No spam. Unsubscribe anytime.

Leave a Comment

Stay sharp.

A weekly breakdown of the tools worth your time — and the ones that aren't.

Join 500+ developers. No spam ever.