Skip to content

Management API

The management API lets you configure projects, view metrics, manage prompts, and administer provider integrations programmatically. All endpoints are under /api/llm/... and require an X-Project-Id header (set automatically when calling through the website proxy).


Sessions

Base path: /api/llm/sessions

See also: Session Telemetry for correlating OTel spans and logs with sessions.

GET /

List sessions with aggregated metrics.

Query ParamTypeRequiredDescription
project_idUUIDyesProject to query
limitintegernoMax results (default 20, max 1000)
offsetintegernoPagination offset
user_idstringnoFilter by user
name_patternstringnoFilter by session name (LIKE)
start_datedatetimenoFilter start
end_datedatetimenoFilter end

Returns a paginated list of sessions with request_count, total_tokens, total_cost_usd, error_count, and feedback_score.

GET /

Get detailed session information including token breakdown, average latency, models used, and feedback.

GET /{session_id}/requests

List individual LLM requests within a session. Supports limit and offset pagination.

POST /{session_id}/feedback

Submit feedback for a session.

json
{
  "project_id": "uuid",
  "score": 4,
  "text": "Very helpful session"
}

score must be 1-5 if provided. At least one of score or text is required.


Evaluation Scores

Base path: /api/llm/scores

POST /

Submit a single evaluation score for an LLM request.

json
{
  "project_id": "uuid",
  "request_id": "req-123",
  "score_name": "relevance",
  "score_value": 0.95,
  "score_type": "number",
  "reason": "Highly relevant response",
  "evaluator_type": "human"
}

score_type can be number (0-100), boolean (0 or 1), or category.

GET /

List scores with optional filters: score_name, evaluator_type, start_date, end_date. Supports limit/offset pagination.

GET /request/

Get all evaluation scores for a specific LLM request.

POST /batch

Submit multiple scores atomically in a single transaction.

json
{
  "project_id": "uuid",
  "scores": [
    { "request_id": "req-1", "score_name": "relevance", "score_value": 0.9 },
    { "request_id": "req-1", "score_name": "coherence", "score_value": 0.85 }
  ]
}

Metrics

Base path: /api/llm/metrics

All metrics endpoints accept project_id (required), start_date, end_date, limit, and offset as query parameters.

GET /users

Per-user LLM usage: request count, session count, tokens, cost, errors, and models used.

GET /models

Per-model metrics including latency percentiles (p50, p95, p99), error rate, token usage, and cost.

GET /cost/daily

Daily cost breakdown: request count, input/output tokens, and total cost per day.

GET /cost/by-model

Cost breakdown by model with percentage of total spend.

GET /overview

Dashboard overview: total requests, sessions, users, tokens, cost, error rate, average latency, and top models. Date range capped at 30 days.

GET /provider-latency

Real-time provider latency from the in-memory tracker. Returns p50/p95/p99 latency, sample count, and degradation status per provider. No query parameters required.


Pricing

Base path: /api/llm/pricing

GET /

List all model pricing. Supports source filter (helicone, openrouter, manual) and search (ILIKE on model/provider).

GET /

List pricing for a specific provider.

POST /sync

Trigger a manual pricing sync from Helicone and OpenRouter. Returns sync status, counts of models added/updated/skipped, and any errors.

GET /sync-history

Get the last 50 pricing sync log entries.

PUT /{provider}/

Override pricing for a specific model. Sets the source to manual so future auto-syncs won't overwrite it.

json
{
  "input_cost_per_1m": "3.00",
  "output_cost_per_1m": "15.00",
  "cache_read_cost_per_1m": "0.30",
  "cache_write_cost_per_1m": "3.75"
}

Prompts and Rollouts

Base path: /api/llm/prompts

Prompt Configs

POST /configs

Create a prompt config.

json
{
  "project_id": "uuid",
  "name": "customer-support-v2",
  "description": "Customer support system prompt"
}

GET /configs

List prompt configs for a project. Returns config metadata, active version number, version count, and whether an active rollout exists.

GET /configs/

Get a single prompt config with version metadata.

PUT /configs/

Update a prompt config's name or description.

DELETE /configs/

Delete a prompt config. Blocked if an active rollout exists. Cascades to all versions.

Prompt Versions

POST /configs/{config_id}/versions

Create a new prompt version. Version numbers auto-increment. The first version is automatically activated.

json
{
  "project_id": "uuid",
  "system_prompt": "You are a helpful assistant for {{company_name}}.",
  "model": "gpt-4o",
  "temperature": 0.7,
  "max_tokens": 1024,
  "variables": [
    { "name": "company_name", "type": "string", "required": true }
  ],
  "tools": [...],
  "response_format": { "type": "json_schema", "json_schema": {...} },
  "commit_message": "Added company personalization"
}

GET /configs/{config_id}/versions

List versions for a prompt config.

GET /configs/{config_id}/versions/

Get a single prompt version.

Rollouts

POST /rollouts

Create a progressive rollout (A/B test) between a baseline and target prompt version.

json
{
  "project_id": "uuid",
  "config_id": "uuid",
  "target_version_id": "uuid",
  "name": "v3-canary",
  "mode": "auto",
  "allocation_type": "user_sticky",
  "stages": [
    { "weight": 10, "min_duration_minutes": 30, "min_requests": 500 },
    { "weight": 50, "min_duration_minutes": 60 },
    { "weight": 100 }
  ]
}

mode can be auto (worker evaluates metrics and promotes/rolls back) or manual.

allocation_type can be random, user_sticky, or session_sticky.

If stages is omitted, a default 7-stage progression is used (1% to 100%).

GET /rollouts

List rollouts. Supports filtering by config_id and status.

GET /rollouts/

Get a rollout with all its stages.

POST /rollouts/{rollout_id}/start

Start a pending rollout.

POST /rollouts/{rollout_id}/pause

Pause a running rollout. Requests fall through without a managed prompt while paused.

POST /rollouts/{rollout_id}/promote

Manually advance to the next stage (or complete if at the final stage).

POST /rollouts/{rollout_id}/rollback

Roll back to baseline.

POST /rollouts/{rollout_id}/complete

Force-complete a rollout, skipping remaining stages. Sets the target version as active.

GET /rollouts/{rollout_id}/metrics

Get target vs baseline metrics comparison: request count, error rate, latency (avg + p95), cost, and an overall passing/failing/inconclusive status.


Playground

Base path: /api/llm/playground

POST /

Run a single playground prompt.

json
{
  "project_id": "uuid",
  "model": "claude-3-5-sonnet",
  "messages": [
    { "role": "user", "content": "Explain quantum computing simply." }
  ],
  "temperature": 0.7,
  "auto_evaluate": true
}

Returns the model response, token usage, latency, cost, and optional LLM-as-judge evaluation scores (relevance, coherence, helpfulness).

POST /compare

Compare the same prompt across multiple models in parallel (max 5, 30s timeout per model).

json
{
  "project_id": "uuid",
  "messages": [{ "role": "user", "content": "Hello!" }],
  "compare_models": ["gpt-4o", "claude-3-5-sonnet", "gemini-2.0-flash"]
}

Returns per-model responses plus a cost comparison with tokens_per_dollar, and identifies the fastest and cheapest model.


Base path: /api/llm/search

POST /

Search LLM request history by text matching on prompts and completions.

json
{
  "project_id": "uuid",
  "query": "password reset flow",
  "limit": 10,
  "model": "gpt-4o",
  "start_time": "2026-01-01T00:00:00Z"
}

Returns matching requests with content previews.


Provider Integrations

Base path: /api/llm/integrations

GET /

List all configured provider integrations for the project.

POST /

Add a provider integration. API keys are encrypted at rest.

json
{
  "provider": "openai",
  "api_key": "sk-...",
  "enabled": true
}

For AWS Bedrock, use access_key_id, secret_access_key, and region instead of api_key.

PUT /

Update an integration (rotate key, toggle enabled).

DELETE /

Remove a provider integration and its stored key.

POST /{provider}/test

Test connectivity to a provider. Optionally pass a temporary key to test before saving.


Gateway Settings

Base path: /api/llm/settings

GET /

Get all gateway settings for the project.

PUT /

Update gateway settings. Accepts the full settings object:

json
{
  "introspection_enabled": true,
  "thinking_budget_tokens": 10000,
  "fallback_enabled": true,
  "fallback_order": ["anthropic", "openai", "google"],
  "retry_enabled": true,
  "retry_max_attempts": 3,
  "monthly_budget_usd": 500.0,
  "budget_alert_enabled": true,
  "budget_hard_stop": false,
  "rate_limit_enabled": true,
  "rate_limit_rpm": 120,
  "rate_limit_tpm": 200000,
  "session_budget_usd": 1.50,
  "guardrails": {
    "trust_mode": "agent",
    "blocked_input_topics": ["violence"],
    "blocked_output_topics": [],
    "max_prompt_tokens": 4096,
    "pii_block_on_detect": false,
    "prompt_injection_detection": true,
    "spotlighting_enabled": true,
    "mask_output_pii": true,
    "min_quality_score": 0.7,
    "blocked_tools": ["send_email", "execute_sql"],
    "block_exfiltration_urls": true
  },
  "agent_enabled": true,
  "agent_scopes": ["project:read", "llm:read", "observability:read", "billing:read"]
}

In-App Agent

Base path: /api/agent

The in-app AI agent provides a conversational interface backed by the platform's MCP tools. Responses stream via Server-Sent Events (SSE).

POST /chat

Send a message to the agent. Returns an SSE stream of events.

json
{
  "conversation_id": "uuid (optional — omit to start a new conversation)",
  "message": "What's my LLM usage this week?",
  "page_context": {
    "page": "/projects/my-project/llm/overview",
    "data": {}
  }
}

SSE event types:

Event typeFieldsDescription
conversation_createdconversation_idEmitted at the start of a new conversation
text_deltacontentStreamed text chunk from the assistant
tool_startcall_id, nameA tool call has started
tool_resultcall_id, contentTool execution completed
doneStream is complete
errorcontentAn error occurred

GET /conversations

List conversations for the current project.

Query ParamTypeDefaultDescription
limitinteger50Max results (max 100)
offsetinteger0Pagination offset

POST /conversations

Create a new conversation.

json
{
  "title": "Optional conversation title"
}

DELETE /conversations/

Delete a conversation and all its messages. Returns 404 if the conversation does not exist.

GET /conversations/{id}/messages

List messages in a conversation, ordered chronologically.

Query ParamTypeDefaultDescription
limitinteger100Max results
offsetinteger0Pagination offset