Pond — Data Warehouse

Pond is DataHippo's federated data warehouse. Connect your databases, SaaS tools, files, and streaming sources, then query them all with standard SQL through any PostgreSQL client. No ETL pipelines required.

INFO

Detailed documentation for Pond is coming soon. This page provides a feature overview.

Features

Federated SQL

Query across all connected data sources using standard SQL. Pond's federation planner optimizes cross-source joins automatically.

PostgreSQL Wire Protocol

Connect with any PostgreSQL client — psql, Metabase, Grafana, Tableau, DBeaver, or your application's existing database driver.

bash

psql -h pond.datahippo.io -p 5433 -U dh_your_key

sql

SELECT c.amount, o.status
FROM stripe.charges c
JOIN postgres.orders o ON c.metadata->>'order_id' = o.id::text
WHERE c.created > '2025-01-01';

Data Sources

Databases

PostgreSQL, MySQL, ClickHouse, SQL Server, MongoDB
Snowflake, BigQuery, Redshift, SQLite

SaaS

Stripe, Google Sheets

Files

CSV, Excel, JSON / NDJSON

Object Storage

S3, Cloudflare R2 (Parquet files)

Streaming

Kafka, AWS Kinesis

Blockchain

Ethereum, Solana, Bitcoin, Polygon

Storage Tiers

Hot — ClickHouse-backed for frequently queried data with sub-second latency.
Warm — R2/S3 Parquet files with skip indexes for cost-efficient analytical queries.
Cold — On-demand federation for infrequently accessed sources.

Skip Index System

Parquet files are indexed with FST, Xor filters, and MinMax statistics for efficient file pruning. Only relevant files are scanned during queries.

Catalog Service

Automatic schema discovery and lineage tracking across all connected sources. See column-level dependencies and data flow.

Sync Pipeline

Configurable sync schedules with interval, cron, and manual triggers. Track sync status, row counts, and error rates.

Natural Language Queries

Ask questions in plain English and Pond translates them to SQL across your connected sources.

Pond — Data Warehouse ​

Features ​

Federated SQL ​

PostgreSQL Wire Protocol ​

Data Sources ​

Storage Tiers ​

Skip Index System ​

Catalog Service ​

Sync Pipeline ​

Natural Language Queries ​