Features
Flow provides a suite of features on top of basic LLM routing. All features work across every supported provider.
Automatic Failover
When a provider returns an error or times out, Flow automatically retries with exponential backoff. If the primary provider is unavailable, it falls back to an alternate provider that supports a compatible model.
Response headers indicate when failover occurs:
| Header | Description |
|---|---|
x-datahippo-fallback-used | "true" when a fallback provider served the response |
x-datahippo-original-model | The model originally requested |
x-datahippo-model-used | The model that actually served the response |
x-datahippo-retry-count | Number of retries before success |
Semantic Caching
Flow caches responses using a two-layer cache:
- L1 — In-process LRU cache for sub-millisecond lookups.
- L2 — Distributed semantic cache (Redis-backed) shared across gateway instances.
Caching is eligible when:
streamisfalse(or absent)temperatureis0- No tools are specified
nis1(or absent)
The x-datahippo-cache response header reports "hit", "miss", or "skip".
Guardrails
Guardrails run before and after the LLM call to enforce content policies.
Input Guardrails
| Guardrail | Description |
|---|---|
| Topic blocklist | Rejects requests that match blocked topics. |
| Token cap | Rejects requests exceeding a configurable token limit. |
| PII block-on-detect | Blocks the request entirely if PII is detected in the input. |
Output Guardrails
| Guardrail | Description |
|---|---|
| PII masking | Redacts PII (names, emails, phone numbers, etc.) from the response before returning it. |
| Topic blocklist | Blocks responses that match forbidden topics. |
| LLM-as-judge | Uses a secondary LLM call to evaluate the response against custom criteria. |
When a guardrail triggers, the response includes a structured error with the rule that fired (e.g., pii_blocked, token_limit, blocked_input_topic, blocked_output_topic).
PII Masking
PII masking can be enabled independently of guardrails. When active, it scans both input and output for personally identifiable information and redacts it. This operates transparently — your application receives the redacted text without needing to handle PII detection itself.
Session Budgets
Set a per-session cost limit using the x-datahippo-session-id header and the project's gateway_session_budget_usd setting. Once a session exceeds the budget, further requests are rejected with a 429 error. This prevents runaway costs from chatbot loops or automated agents.
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
extra_headers={"x-datahippo-session-id": "session-abc-123"}
)Output Contracts
Define a JSON schema in your prompt version's response format, and Flow will validate the LLM's response against it. This is useful for structured output where downstream code expects a specific shape.
When validation fails, the behavior depends on the configured output_failure_action:
| Action | Description |
|---|---|
error | Return an error to the caller. |
retry | Retry the LLM call (up to a limit). |
retry_then_passthrough | Retry, then pass through the invalid response if retries are exhausted. |
log_only | Pass through the response but log the violation. |
The x-output-contract-violation response header is set to "true" when a passthrough occurs.
Extended Thinking / Introspection
Flow supports extended thinking for models that offer it:
- Anthropic Claude — Extended thinking with configurable budget tokens.
- OpenAI o-series — Reasoning effort levels (
low,medium,high). - Google Gemini — Gemini thinking mode.
response = client.chat.completions.create(
model="claude-3-5-sonnet",
messages=[{"role": "user", "content": "Solve this step by step."}],
extra_body={
"thinking": {
"type": "enabled",
"budget_tokens": 10000
}
}
)Multimodal Support
Flow supports multimodal messages across providers. Send images and documents alongside text:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}]
)Document attachments (PDF, etc.) are also supported via document_url content parts.
Cost Tracking
Every request is logged with token usage and cost, broken down by provider. This data is available in the DataHippo dashboard for monitoring spend across models, projects, and time periods.
Observability
All gateway requests are recorded with:
- Request and response payloads
- Token counts (input, output, total)
- Latency (end-to-end and provider time)
- Provider and model used
- Cache hit/miss status
- Prompt version and rollout variant (if applicable)
- Error details (if any)
This data integrates with DataHippo Watch for end-to-end tracing — a single trace can show the API request, the LLM call it triggered, and the cost of each step.