Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.nevermined.app/llms.txt

Use this file to discover all available pages before exploring further.

Start here: need to register a service and create a plan first? Follow the 5-minute setup.

Runnable tutorial

langchain-langsmith-deployment-py — a deliberately minimal LangGraph agent with POST /threads/{id}/runs/wait gated by Nevermined x402. Clone, fill in .env, run poetry run buyer to drive the full 402 → token-acquisition → settlement round-trip in five numbered steps.
Add payment protection to a LangGraph agent deployed to LangSmith Deployment (the rebrand of LangGraph Platform) using the x402 protocol. This is the deployment-time alternative to the per-tool @requires_payment decorator covered in LangChain — both can coexist; they protect different layers.
LayerTool-time (LangChain)Deployment-time (this page)
Code surface@requires_payment on individual @tool functionsPaymentMiddleware mounted via langgraph.json http.app
Gated unitA single tool call inside the agentThe agent’s HTTP entry point (e.g. POST /threads/{id}/runs/wait)
Charge frequencyOnce per tool invocationOnce per HTTP request to the deployment
RuntimeAny LangChain / LangGraph hostLangSmith Deployment, langgraph dev, langgraph up

Install

pip install payments-py[langsmith]
The [langsmith] extra pulls fastapi, starlette, and langsmith.
Python only. LangSmith Deployment’s custom-app surface is documented by LangChain as Python-only. A TypeScript variant is tracked in our LangChain integration epic but blocked on LangChain shipping a TS runtime.

Define the middleware app

Create nvm_app.py next to your langgraph.json. Four lines of glue:
# nvm_app.py
import os
from payments_py import Payments, PaymentOptions
from payments_py.langsmith import build_payment_app, RouteConfig

payments = Payments.get_instance(
    PaymentOptions(
        nvm_api_key=os.environ["NVM_API_KEY"],
        environment=os.environ.get("NVM_ENVIRONMENT", "sandbox"),
    )
)

app = build_payment_app(
    payments=payments,
    routes={
        "POST /threads/{thread_id}/runs/wait": RouteConfig(
            plan_id=os.environ["NVM_PLAN_ID"],
            credits=int(os.environ.get("NVM_CREDITS_PER_INVOKE", "1")),
        ),
    },
)
build_payment_app returns a FastAPI app pre-wired with PaymentMiddleware. Mount it from langgraph.json:
{
  "graphs": { "my_agent": "./src/agent.py:graph" },
  "http": { "app": "./nvm_app.py:app" },
  "env": ".env"
}
That’s the whole integration. langgraph dev (local) and langgraph up (Docker) both honor the http.app field; the middleware composes around LangSmith Deployment’s built-in routes (/runs, /threads/{id}/runs, /assistants, etc.).
Why FastAPI? Some langgraph-api versions crash on plain Starlette http.app wrappers due to an upstream OpenAPI generation bug. FastAPI takes a clean path through app.openapi(). The build_payment_app factory returns a FastAPI app so you do not need to know about this — PaymentMiddleware itself is a BaseHTTPMiddleware and works on both.

The 402 round-trip

# 1. Create a thread (unprotected)
THREAD=$(curl -s -X POST http://127.0.0.1:2024/threads \
  -H 'content-type: application/json' -d '{}' | jq -r .thread_id)

# 2. First attempt without payment-signature → 402 + envelope
curl -i -X POST "http://127.0.0.1:2024/threads/$THREAD/runs/wait" \
  -H 'content-type: application/json' \
  -d '{"assistant_id":"my_agent","input":{"messages":[{"type":"human","content":"hello"}]}}'
# HTTP/1.1 402 Payment Required
# payment-required: eyJ4NDAyVmVyc2lvbi...   ← base64-encoded x402 envelope
# {"error":"Payment Required","message":"Missing x402 payment token..."}

# 3. Acquire an x402 token from the envelope's plan_id (via payments-py)
# 4. Retry with the payment-signature header → 200 + settlement receipt
curl -i -X POST "http://127.0.0.1:2024/threads/$THREAD/runs/wait" \
  -H 'content-type: application/json' \
  -H "payment-signature: $TOKEN" \
  -d '{"assistant_id":"my_agent","input":{"messages":[{"type":"human","content":"hello"}]}}'
# HTTP/1.1 200 OK
# payment-response: eyJzdWNjZXNzIjp0cn...   ← base64-encoded SettleResponse
# {"messages":[{"type":"human","content":"hello"},{"type":"ai","content":"<agent reply>"}]}
Steps 3-4 in real client code: see the buyer script in the tutorial. The buyer uses payments_py.x402.resolve_scheme.resolve_network to pick the right enrolled payment method from the plan metadata.

Per-route pricing

RouteConfig accepts a static int or a callable for credits:
from payments_py.langsmith import build_payment_app, RouteConfig

app = build_payment_app(
    payments=payments,
    routes={
        "POST /threads/{thread_id}/runs/wait": RouteConfig(
            plan_id="plan-cheap", credits=1,
        ),
        # Dynamic credits — sync or async callable
        "POST /threads/{thread_id}/runs/stream": RouteConfig(
            plan_id="plan-premium",
            credits=lambda req: estimate_credits(req),
        ),
    },
)
Path parameters work with either Starlette :param or FastAPI/LangGraph {param} syntax — both match by position. Routes not listed pass through ungated.

Lifecycle

The middleware implements the canonical x402 verify → agent runs → settle ordering inside one HTTP cycle. Failed agent runs (non-2xx) skip settlement so buyers are not charged. Settle failures after a successful 2xx are logged but do not surface to the client — the buyer already received the value. For the full step-by-step diagram, see chapter 13 of the SDK docs.

Why /runs/wait specifically

LangSmith Deployment exposes three run-execution shapes:
EndpointBehaviorWorks with this middleware?
POST /threads/{id}/runs/waitSynchronous; blocks until the agent finishes, returns final stateYes — the only path that fits verify-then-work-then-settle in one HTTP cycle
POST /threads/{id}/runsBackground; returns 202 immediately with a run_idNo — settle would fire before the agent did the work
POST /threads/{id}/runs/streamServer-sent events; streams agent outputPartially — the middleware buffers the response body to attach the settlement header, which negates streaming
Gate /runs/wait for a clean demo. The middleware will pass through /threads, /assistants/search, /info, /ok, and other non-billable endpoints automatically.

Observability

When LANGSMITH_TRACING=true is set, the middleware emits two top-level traces per gated request:
nvm:x402-request            ← middleware parent trace
├─ nvm:verify                ← child, nvm.* metadata (plan_ids, scheme, network, payer, payment_token abbreviated)
└─ nvm:settlement            ← child, nvm.* metadata (credits_redeemed, balance.after, tx_hash)

my_agent                    ← LangGraph's separate trace (sibling, not nested)
The graph’s trace appears as a sibling top-level because langgraph-api initiates it at the graph-invocation boundary, independent of our middleware’s trace context. Both nvm spans plus the parent carry searchable nvm.* metadata; the raw payment-signature token is abbreviated to eyJ4NDAyVmVyc2lvb…bsig-style so it can be cross-referenced without exposure. Verification failures raise PaymentRequiredError inside verify_span so LangSmith marks the parent + child as failed via the canonical context-manager exit path. Settle failures after a successful 2xx mark only the settle child as failed; the parent stays successful (matching the buyer-visible 200).

Host a chat UI on top of the deployment

Runnable tutorial

langchain-chat-ui-nvm — a Next.js fork of LangChain’s agent-chat-ui with a card-delegation popup and an x402-aware proxy. Pairs with the langchain-research-agent-py companion (in-tool @requires_payment) for a full browser demo.
The CLI buyer above is enough to validate the protocol, but most demos want a face. The langchain-chat-ui-nvm tutorial does this by forking langchain-ai/agent-chat-ui and adding a handful of Next.js API routes plus a popup target.
This section describes a chat-UI host built on top of the in-tool gating pattern (@requires_payment from LangChain) rather than the route-level middleware on the rest of this page. The middleware gates every HTTP request — fine for “every call is paid” pricing, but it forces users to pay before they can ask the agent what it does. The chat-UI flow benefits from letting the LLM act as a free concierge (introspection, capability discovery) and only charging when a paid tool actually fires. Pick by UX: the same PaymentMiddleware would work if you want a hard paywall in front of /runs/stream.
The flow:
  1. The user opens the chat and clicks Authorize on a top banner. A popup opens at https://embed.<tier-host>/cards/setup?sessionToken=…&returnUrl=…/x402-callback&state=… (e.g. embed.nevermined.dev for staging, embed.nevermined.app for production — the standalone embed app that replaced the webapp’s removed /embed/* routes).
  2. They enrol a card on the embed page (Stripe, Braintree, or Visa Intelligent Commerce) and pick a budget — e.g. $10 / 24 h — then submit.
  3. Nevermined redirects the popup back to the chat UI’s callback page with paymentMethodId, delegationId, and the round-tripped state nonce. The callback validates state, postMessages the IDs to window.opener, then closes itself.
  4. The chat UI’s Next.js server mints an x402 access token from the delegationId (payments.x402.getX402AccessToken(planId, agentId, { delegationConfig: { delegationId } }), pattern B in @nevermined-io/payments) and stores it in a httpOnly cookie. The browser never sees the raw token.
  5. From then on, the catch-all /api/[..._path] proxy reads the cookie on every outgoing LangGraph run and JSON-injects it into the run body at config.configurable.payment_token — the contract @requires_payment reads from. The agent’s tool runs verify → execute → settle internally; the LLM concierge handles everything else for free.
The NVM_API_KEY lives only on the Next.js server. The browser holds, at most, the short-lived sessionToken for the duration of the popup. Because gating is in-graph (no http.app on langgraph.json), neither /runs/wait nor /runs/stream need to be in any routes map — the chat UI’s useStream hits /runs/stream and the tool itself decides whether to charge.
Browser → Next.js proxy → LangGraph (vanilla, no middleware)
                              └─ tool ─ @requires_payment ── facilitator
Browser → /api/x402/session → POST /api/v1/widgets/session/self  (mint widget session)
Browser → popup → embed.<tier>/cards/setup                       (user authorizes)
Browser ← postMessage(delegationId) ← /x402-callback              (popup closes)
Browser → /api/x402/token → mints x402 access token, sets cookie
Important constraints:
  • The widget session uses the self-mint endpoint (POST /api/v1/widgets/session/self), which restricts returnUrl to localhost / 127.0.0.1 / [::1]. Deploying the chat UI to a real domain requires the widget-key flow instead.
  • The agent’s paid tool is bound to a single plan in this demo (the chat UI reads NVM_PLAN_ID from env, the tool uses the matching plan id in @requires_payment(plan_id=…)). Multi-tool agents with mixed pricing need one accepts[] entry per paid tool.
  • The popup pattern needs same-origin between the chat UI and the callback page (both served by Next.js).
Full setup, troubleshooting, and architecture notes live in the tutorial README.

Combining with @requires_payment

The middleware and the LangChain decorator can be used together — the middleware gates the agent’s HTTP entry point, the decorator gates individual tools inside the agent. Each layer charges independently. Common pattern: charge a flat rate per agent invocation (middleware) plus dynamic per-tool credits (decorator) for expensive tool calls. The chat-UI tutorial above is an example of using the decorator alone at deployment-time (no middleware on the HTTP layer). That choice is driven by the UX — free introspection, paid execution — and it composes cleanly with vanilla LangSmith Deployment because the agent doesn’t need a custom http.app.

Limitations

  • Streaming responses are buffered. The middleware reads the downstream response body in full before attaching the payment-response settlement header. SSE / /runs/stream endpoints become blocking-then-bulk. Gate /runs/wait only, or accept the trade-off.
  • Python only. TypeScript variant tracked but blocked on LangChain shipping a TS runtime.
  • Sync I/O is wrapped. The four sync SDK calls (resolve_scheme, resolve_network, verify_permissions, settle_permissions) run via asyncio.to_thread(...) so they don’t block the event loop. langgraph dev’s blocking-call detector treats unwrapped sync HTTP as fatal warnings.

See also