LangSmith Deployment

Start here: need to register a service and create a plan first? Follow the 5-minute setup.

Runnable tutorial

langchain-langsmith-deployment-py — a deliberately minimal LangGraph agent with POST /threads/{id}/runs/wait gated by Nevermined x402. Clone, fill in .env, run poetry run buyer to drive the full 402 → token-acquisition → settlement round-trip in five numbered steps.

Add payment protection to a LangGraph agent deployed to LangSmith Deployment (the rebrand of LangGraph Platform) using the x402 protocol. This is the deployment-time alternative to the per-tool @requires_payment decorator covered in LangChain — both can coexist; they protect different layers.

Layer	Tool-time (LangChain)	Deployment-time (this page)
Code surface	`@requires_payment` on individual `@tool` functions	`PaymentMiddleware` mounted via `langgraph.json` `http.app`
Gated unit	A single tool call inside the agent	The agent’s HTTP entry point (e.g. `POST /threads/{id}/runs/wait`)
Charge frequency	Once per tool invocation	Once per HTTP request to the deployment
Runtime	Any LangChain / LangGraph host	LangSmith Deployment, `langgraph dev`, `langgraph up`

Install

pip install payments-py[langsmith]

The [langsmith] extra pulls fastapi, starlette, and langsmith.

Python only. LangSmith Deployment’s custom-app surface is documented by LangChain as Python-only. A TypeScript variant is tracked in our LangChain integration epic but blocked on LangChain shipping a TS runtime.

Define the middleware app

Create nvm_app.py next to your langgraph.json. Four lines of glue:

# nvm_app.py
import os
from payments_py import Payments, PaymentOptions
from payments_py.langsmith import build_payment_app, RouteConfig

payments = Payments.get_instance(
    PaymentOptions(
        nvm_api_key=os.environ["NVM_API_KEY"],
        environment=os.environ.get("NVM_ENVIRONMENT", "sandbox"),
    )
)

app = build_payment_app(
    payments=payments,
    routes={
        "POST /threads/{thread_id}/runs/wait": RouteConfig(
            plan_id=os.environ["NVM_PLAN_ID"],
            credits=int(os.environ.get("NVM_CREDITS_PER_INVOKE", "1")),
        ),
    },
)

build_payment_app returns a FastAPI app pre-wired with PaymentMiddleware. Mount it from langgraph.json:

{
  "graphs": { "my_agent": "./src/agent.py:graph" },
  "http": { "app": "./nvm_app.py:app" },
  "env": ".env"
}

That’s the whole integration. langgraph dev (local) and langgraph up (Docker) both honor the http.app field; the middleware composes around LangSmith Deployment’s built-in routes (/runs, /threads/{id}/runs, /assistants, etc.).

Why FastAPI? Some langgraph-api versions crash on plain Starlette http.app wrappers due to an upstream OpenAPI generation bug. FastAPI takes a clean path through app.openapi(). The build_payment_app factory returns a FastAPI app so you do not need to know about this — PaymentMiddleware itself is a BaseHTTPMiddleware and works on both.

The 402 round-trip

# 1. Create a thread (unprotected)
THREAD=$(curl -s -X POST http://127.0.0.1:2024/threads \
  -H 'content-type: application/json' -d '{}' | jq -r .thread_id)

# 2. First attempt without payment-signature → 402 + envelope
curl -i -X POST "http://127.0.0.1:2024/threads/$THREAD/runs/wait" \
  -H 'content-type: application/json' \
  -d '{"assistant_id":"my_agent","input":{"messages":[{"type":"human","content":"hello"}]}}'
# HTTP/1.1 402 Payment Required
# payment-required: eyJ4NDAyVmVyc2lvbi...   ← base64-encoded x402 envelope
# {"error":"Payment Required","message":"Missing x402 payment token..."}

# 3. Acquire an x402 token from the envelope's plan_id (via payments-py)
# 4. Retry with the payment-signature header → 200 + settlement receipt
curl -i -X POST "http://127.0.0.1:2024/threads/$THREAD/runs/wait" \
  -H 'content-type: application/json' \
  -H "payment-signature: $TOKEN" \
  -d '{"assistant_id":"my_agent","input":{"messages":[{"type":"human","content":"hello"}]}}'
# HTTP/1.1 200 OK
# payment-response: eyJzdWNjZXNzIjp0cn...   ← base64-encoded SettleResponse
# {"messages":[{"type":"human","content":"hello"},{"type":"ai","content":"<agent reply>"}]}

Steps 3-4 in real client code: see the buyer script in the tutorial. The buyer uses payments_py.x402.resolve_scheme.resolve_network to pick the right enrolled payment method from the plan metadata.

Per-route pricing

RouteConfig accepts a static int or a callable for credits:

from payments_py.langsmith import build_payment_app, RouteConfig

app = build_payment_app(
    payments=payments,
    routes={
        "POST /threads/{thread_id}/runs/wait": RouteConfig(
            plan_id="plan-cheap", credits=1,
        ),
        # Dynamic credits — sync or async callable
        "POST /threads/{thread_id}/runs/stream": RouteConfig(
            plan_id="plan-premium",
            credits=lambda req: estimate_credits(req),
        ),
    },
)

Path parameters work with either Starlette :param or FastAPI/LangGraph {param} syntax — both match by position. Routes not listed pass through ungated.

Lifecycle

The middleware implements the canonical x402 verify → agent runs → settle ordering inside one HTTP cycle. Failed agent runs (non-2xx) skip settlement so buyers are not charged. Settle failures after a successful 2xx are logged but do not surface to the client — the buyer already received the value. For the full step-by-step diagram, see chapter 13 of the SDK docs.

Why `/runs/wait` specifically

LangSmith Deployment exposes three run-execution shapes:

Endpoint	Behavior	Works with this middleware?
`POST /threads/{id}/runs/wait`	Synchronous; blocks until the agent finishes, returns final state	Yes — the only path that fits verify-then-work-then-settle in one HTTP cycle
`POST /threads/{id}/runs`	Background; returns 202 immediately with a `run_id`	No — settle would fire before the agent did the work
`POST /threads/{id}/runs/stream`	Server-sent events; streams agent output	Partially — the middleware buffers the response body to attach the settlement header, which negates streaming

Gate /runs/wait for a clean demo. The middleware will pass through /threads, /assistants/search, /info, /ok, and other non-billable endpoints automatically.

Observability

When LANGSMITH_TRACING=true is set, the middleware emits two top-level traces per gated request:

nvm:x402-request            ← middleware parent trace
├─ nvm:verify                ← child, nvm.* metadata (plan_ids, scheme, network, payer, payment_token abbreviated)
└─ nvm:settlement            ← child, nvm.* metadata (credits_redeemed, balance.after, tx_hash)

my_agent                    ← LangGraph's separate trace (sibling, not nested)

The graph’s trace appears as a sibling top-level because langgraph-api initiates it at the graph-invocation boundary, independent of our middleware’s trace context. Both nvm spans plus the parent carry searchable nvm.* metadata; the raw payment-signature token is abbreviated to eyJ4NDAyVmVyc2lvb…bsig-style so it can be cross-referenced without exposure. Verification failures raise PaymentRequiredError inside verify_span so LangSmith marks the parent + child as failed via the canonical context-manager exit path. Settle failures after a successful 2xx mark only the settle child as failed; the parent stays successful (matching the buyer-visible 200).

Host a chat UI on top of the deployment

Runnable tutorial

langchain-chat-ui-nvm — a Next.js fork of LangChain’s agent-chat-ui with a card-delegation popup and an x402-aware proxy. Pairs with the langchain-research-agent-py companion (in-tool @requires_payment) for a full browser demo.

The CLI buyer above is enough to validate the protocol, but most demos want a face. The langchain-chat-ui-nvm tutorial does this by forking langchain-ai/agent-chat-ui and adding a handful of Next.js API routes plus a popup target.

This section describes a chat-UI host built on top of the in-tool gating pattern (@requires_payment from LangChain) rather than the route-level middleware on the rest of this page. The middleware gates every HTTP request — fine for “every call is paid” pricing, but it forces users to pay before they can ask the agent what it does. The chat-UI flow benefits from letting the LLM act as a free concierge (introspection, capability discovery) and only charging when a paid tool actually fires. Pick by UX: the same PaymentMiddleware would work if you want a hard paywall in front of /runs/stream.

The flow:

The user opens the chat and clicks Authorize on a top banner. A popup opens at https://embed.<tier-host>/cards/setup?sessionToken=…&returnUrl=…/x402-callback&state=… (e.g. embed.nevermined.dev for staging, embed.nevermined.app for production — the standalone embed app that replaced the webapp’s removed /embed/* routes).
They enrol a card on the embed page (Stripe, Braintree, or Visa Intelligent Commerce) and pick a budget — e.g. $10 / 24 h — then submit.
Nevermined redirects the popup back to the chat UI’s callback page with paymentMethodId, delegationId, and the round-tripped state nonce. The callback validates state, postMessages the IDs to window.opener, then closes itself.
The chat UI’s Next.js server mints an x402 access token from the delegationId (payments.x402.getX402AccessToken(planId, agentId, { delegationConfig: { delegationId } }), pattern B in @nevermined-io/payments) and stores it in a httpOnly cookie. The browser never sees the raw token.
From then on, the catch-all /api/[..._path] proxy reads the cookie on every outgoing LangGraph run and JSON-injects it into the run body at config.configurable.payment_token — the contract @requires_payment reads from. The agent’s tool runs verify → execute → settle internally; the LLM concierge handles everything else for free.

The NVM_API_KEY lives only on the Next.js server. The browser holds, at most, the short-lived sessionToken for the duration of the popup. Because gating is in-graph (no http.app on langgraph.json), neither /runs/wait nor /runs/stream need to be in any routes map — the chat UI’s useStream hits /runs/stream and the tool itself decides whether to charge.

Browser → Next.js proxy → LangGraph (vanilla, no middleware)
                              └─ tool ─ @requires_payment ── facilitator
Browser → /api/x402/session → POST /api/v1/widgets/session/self  (mint widget session)
Browser → popup → embed.<tier>/cards/setup                       (user authorizes)
Browser ← postMessage(delegationId) ← /x402-callback              (popup closes)
Browser → /api/x402/token → mints x402 access token, sets cookie

Important constraints:

The widget session uses the self-mint endpoint (POST /api/v1/widgets/session/self), which restricts returnUrl to localhost / 127.0.0.1 / [::1]. Deploying the chat UI to a real domain requires the widget-key flow instead.
The agent’s paid tool is bound to a single plan in this demo (the chat UI reads NVM_PLAN_ID from env, the tool uses the matching plan id in @requires_payment(plan_id=…)). Multi-tool agents with mixed pricing need one accepts[] entry per paid tool.
The popup pattern needs same-origin between the chat UI and the callback page (both served by Next.js).

Full setup, troubleshooting, and architecture notes live in the tutorial README.

Combining with `@requires_payment`

The middleware and the LangChain decorator can be used together — the middleware gates the agent’s HTTP entry point, the decorator gates individual tools inside the agent. Each layer charges independently. Common pattern: charge a flat rate per agent invocation (middleware) plus dynamic per-tool credits (decorator) for expensive tool calls. The chat-UI tutorial above is an example of using the decorator alone at deployment-time (no middleware on the HTTP layer). That choice is driven by the UX — free introspection, paid execution — and it composes cleanly with vanilla LangSmith Deployment because the agent doesn’t need a custom http.app.

Limitations

Streaming responses are buffered. The middleware reads the downstream response body in full before attaching the payment-response settlement header. SSE / /runs/stream endpoints become blocking-then-bulk. Gate /runs/wait only, or accept the trade-off.
Python only. TypeScript variant tracked but blocked on LangChain shipping a TS runtime.
Sync I/O is wrapped. The four sync SDK calls (resolve_scheme, resolve_network, verify_permissions, settle_permissions) run via asyncio.to_thread(...) so they don’t block the event loop. langgraph dev’s blocking-call detector treats unwrapped sync HTTP as fatal warnings.

Learn

Build

Payments

Solutions

Integrations

Payment Patterns

Components

Specs

Runnable tutorial

Install

Define the middleware app

The 402 round-trip

Per-route pricing

Lifecycle

Why `/runs/wait` specifically

Observability

Host a chat UI on top of the deployment

Runnable tutorial

Combining with `@requires_payment`

Limitations

See also

Runnable tutorial

​Install

​Define the middleware app

​The 402 round-trip

​Per-route pricing

​Lifecycle

​Why /runs/wait specifically

​Observability

​Host a chat UI on top of the deployment

Runnable tutorial

​Combining with @requires_payment

​Limitations

​See also

Install

Define the middleware app

The 402 round-trip

Per-route pricing

Lifecycle

Why `/runs/wait` specifically

Observability

Host a chat UI on top of the deployment

Combining with `@requires_payment`

Limitations

See also