Skip to content

Features

Flow provides a suite of features on top of basic LLM routing. All features work across every supported provider.

Automatic Failover

When a provider returns an error or times out, Flow automatically retries with exponential backoff. If the primary provider is unavailable, it falls back to an alternate provider that supports a compatible model.

Response headers indicate when failover occurs:

HeaderDescription
x-datahippo-fallback-used"true" when a fallback provider served the response
x-datahippo-original-modelThe model originally requested
x-datahippo-model-usedThe model that actually served the response
x-datahippo-retry-countNumber of retries before success

Semantic Caching

Flow caches responses using a two-layer cache:

  • L1 — In-process LRU cache for sub-millisecond lookups.
  • L2 — Distributed semantic cache (Redis-backed) shared across gateway instances.

Caching is eligible when:

  • stream is false (or absent)
  • temperature is 0
  • No tools are specified
  • n is 1 (or absent)

The x-datahippo-cache response header reports "hit", "miss", or "skip".

Guardrails

Guardrails run before and after the LLM call to enforce content policies.

Input Guardrails

GuardrailDescription
Topic blocklistRejects requests that match blocked topics.
Token capRejects requests exceeding a configurable token limit.
PII block-on-detectBlocks the request entirely if PII is detected in the input.

Output Guardrails

GuardrailDescription
PII maskingRedacts PII (names, emails, phone numbers, etc.) from the response before returning it.
Topic blocklistBlocks responses that match forbidden topics.
LLM-as-judgeUses a secondary LLM call to evaluate the response against custom criteria.

When a guardrail triggers, the response includes a structured error with the rule that fired (e.g., pii_blocked, token_limit, blocked_input_topic, blocked_output_topic).

PII Masking

PII masking can be enabled independently of guardrails. When active, it scans both input and output for personally identifiable information and redacts it. This operates transparently — your application receives the redacted text without needing to handle PII detection itself.

Session Budgets

Set a per-session cost limit using the x-datahippo-session-id header and the project's gateway_session_budget_usd setting. Once a session exceeds the budget, further requests are rejected with a 429 error. This prevents runaway costs from chatbot loops or automated agents.

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_headers={"x-datahippo-session-id": "session-abc-123"}
)

Output Contracts

Define a JSON schema in your prompt version's response format, and Flow will validate the LLM's response against it. This is useful for structured output where downstream code expects a specific shape.

When validation fails, the behavior depends on the configured output_failure_action:

ActionDescription
errorReturn an error to the caller.
retryRetry the LLM call (up to a limit).
retry_then_passthroughRetry, then pass through the invalid response if retries are exhausted.
log_onlyPass through the response but log the violation.

The x-output-contract-violation response header is set to "true" when a passthrough occurs.

Extended Thinking / Introspection

Flow supports extended thinking for models that offer it:

  • Anthropic Claude — Extended thinking with configurable budget tokens.
  • OpenAI o-series — Reasoning effort levels (low, medium, high).
  • Google Gemini — Gemini thinking mode.
python
response = client.chat.completions.create(
    model="claude-3-5-sonnet",
    messages=[{"role": "user", "content": "Solve this step by step."}],
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10000
        }
    }
)

Multimodal Support

Flow supports multimodal messages across providers. Send images and documents alongside text:

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
        ]
    }]
)

Document attachments (PDF, etc.) are also supported via document_url content parts.

Cost Tracking

Every request is logged with token usage and cost, broken down by provider. This data is available in the DataHippo dashboard for monitoring spend across models, projects, and time periods.

Observability

All gateway requests are recorded with:

  • Request and response payloads
  • Token counts (input, output, total)
  • Latency (end-to-end and provider time)
  • Provider and model used
  • Cache hit/miss status
  • Prompt version and rollout variant (if applicable)
  • Error details (if any)

This data integrates with DataHippo Watch for end-to-end tracing — a single trace can show the API request, the LLM call it triggered, and the cost of each step.

DataHippo Documentation