Business Intelligence Data Analytics Data Engineering

ClickHouse and Metabase with Docker: Setup Guide Under 30 Minutes

ClickHouse and Metabase with Docker: Setup Guide Under 30 Minutes
Data Engineering

How to Set Up ClickHouse and Metabase with Docker Compose

⏱️7 min read
👁️Data Engineering · Business Intelligence · Data Analytics
ClickHouse and Metabase Docker Compose setup guide — open source OLAP analytics stack with self-serve dashboards under 30 minutes

ClickHouse + Metabase via Docker Compose — an open-source OLAP analytics stack that handles billions of rows and gives every business team self-serve dashboards, deployable in under 30 minutes.

Most analytics infrastructure conversations in the enterprise Microsoft ecosystem start and end with Power BI and Fabric — and for good reason. But there is a category of use case where a lightweight, open-source OLAP stack is the faster, simpler, and more cost-effective path: a startup building its first analytics capability, a data engineering team prototyping an analytics layer before committing to a managed cloud platform, or an organisation that needs high-performance columnar analytics on a constrained infrastructure budget. The ClickHouse and Metabase Docker combination addresses exactly this scenario — a columnar OLAP database that queries billions of rows in seconds, paired with a self-serve BI tool that non-technical business users can operate without analyst support, deployable on any machine with Docker installed in under 30 minutes.

Why ClickHouse and Metabase Together

ClickHouse and Metabase solve two complementary problems. ClickHouse solves the query performance problem — it is a column-oriented OLAP database designed specifically for analytical workloads, capable of scanning and aggregating billions of rows in seconds on modest hardware through its columnar storage, vectorised execution engine, and aggressive compression. Metabase solves the distribution problem — it is an open-source BI tool with a question-and-dashboard interface that business users can operate without SQL knowledge, an admin interface that data teams can manage, and a clean connection layer to ClickHouse and dozens of other data sources.

Neither tool requires a cloud account, a managed service subscription, or per-seat licensing in their open-source form. Running both on a single server or on a developer's laptop via Docker takes minutes and produces a working analytics stack — one that can handle event analytics, log analysis, product analytics, and operational dashboards at scales that would require significantly larger infrastructure on traditional row-oriented databases.

"ClickHouse with Metabase is the analytics stack that eliminates the backlog. Business teams stop waiting for dashboards to be built because they can build their own. Data engineers stop maintaining bespoke report queries because ClickHouse makes ad hoc aggregation fast enough to run live."

What ClickHouse Is and Why It's Fast

ClickHouse is an open-source column-oriented database management system originally developed at Yandex and now maintained by ClickHouse Inc. Its architecture is specifically designed for analytical workloads that require scanning large amounts of data and aggregating it — the type of query that runs slowly on row-oriented databases like PostgreSQL or MySQL regardless of how well the query is optimised.

The performance advantage comes from four architectural properties working together. Columnar storage means that a query reading only three columns from a 50-column table reads only those three columns' data from disk, not the full row width — dramatically reducing I/O for analytical queries. Vectorised execution processes batches of column values together using CPU SIMD instructions rather than processing one row at a time. Aggressive compression (ClickHouse achieves 6–10× compression ratios on typical analytical data) means more data fits in memory and cache. And native parallelism across CPU cores means that a complex aggregation query automatically uses all available CPU cores without configuration.

The result is a database where queries that take minutes on PostgreSQL consistently run in seconds or milliseconds on ClickHouse for the same data, even on modest hardware. A 1 billion row event table that takes 45 seconds to aggregate on a well-indexed PostgreSQL instance will typically aggregate in under 1 second on ClickHouse on the same hardware.

What Metabase Is and Who It Serves

Metabase is an open-source business intelligence tool with two key design priorities: making it easy for non-technical business users to explore data without writing SQL, and making it quick for data teams to deploy and maintain without a dedicated BI platform engineering investment. Its Question builder interface lets business users select a table, apply filters, choose a visualisation type, and save the result as a dashboard card — without SQL. Its SQL editor gives data analysts the option to write raw SQL queries when needed. Its dashboard builder assembles Question results and SQL-based charts into shareable, refreshable dashboards.

Metabase's connection to ClickHouse is handled through a community-maintained JDBC driver, and the combination is well-tested for production use. Metabase exposes ClickHouse tables, runs user queries against ClickHouse, caches results where configured, and serves dashboard results to any authenticated user — with a permission model that controls which users can see which data and which databases.

Prerequisites Before You Begin

The setup requires Docker and Docker Compose installed on the host machine. Docker Desktop (for macOS and Windows) includes both; Linux installations require Docker Engine and the Docker Compose plugin installed separately. Verify the installation by running docker --version and docker compose version in a terminal — both should return version numbers without errors. A minimum of 4 GB of RAM available to Docker is recommended; 8 GB allows comfortable operation with sample datasets of tens of millions of rows.

Step 1 — The Docker Compose File

Create a project directory and save the following docker-compose.yml file in it. This configuration starts ClickHouse and Metabase as networked services with persistent data volumes, so data survives container restarts.

YAML — docker-compose.yml
version: "3.8"

services:

  clickhouse:
    image: clickhouse/clickhouse-server:latest
    container_name: clickhouse
    ports:
      - "8123:8123"   # HTTP interface (used by Metabase JDBC)
      - "9000:9000"   # Native TCP interface (clickhouse-client)
    volumes:
      - clickhouse_data:/var/lib/clickhouse
    environment:
      CLICKHOUSE_DB: analytics
      CLICKHOUSE_USER: default
      CLICKHOUSE_PASSWORD: ""        # Empty for local dev; set in prod
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
    ulimits:
      nofile:
        soft: 262144
        hard: 262144
    networks:
      - analytics_net

  metabase:
    image: metabase/metabase:latest
    container_name: metabase
    ports:
      - "3000:3000"
    environment:
      MB_DB_TYPE: h2             # Built-in H2 for dev; use Postgres in prod
      JAVA_TIMEZONE: UTC
    depends_on:
      - clickhouse
    networks:
      - analytics_net

volumes:
  clickhouse_data:

networks:
  analytics_net:
    driver: bridge

Two notes on this configuration. First, the ClickHouse password is left empty for local development convenience — in any non-local environment, set a strong password in both the CLICKHOUSE_PASSWORD environment variable and in the Metabase connection settings below. Second, Metabase uses its built-in H2 database to store its own metadata (questions, dashboards, user accounts) in this configuration — for production use, replace H2 with a PostgreSQL instance as Metabase's application database to avoid data loss if the Metabase container is rebuilt.

Step 2 — Start the Services

From the project directory containing the docker-compose.yml file, run the following command to start both services in detached mode:

Terminal — Start Services
docker compose up -d

Docker pulls the ClickHouse and Metabase images on the first run (this takes 2–5 minutes depending on connection speed) and starts both containers. On subsequent runs, the images are already local and both services start in seconds. Monitor the startup logs with docker compose logs -f — ClickHouse is ready when the logs show "Application: Ready for connections" and Metabase is ready when the logs show "Metabase Initialization COMPLETE".

Step 3 — Verify ClickHouse Is Running

Verify the ClickHouse HTTP interface is responding by opening http://localhost:8123 in a browser — a plain text response of "Ok." confirms ClickHouse is up. To run queries interactively, connect via the ClickHouse HTTP interface using curl or via the ClickHouse client built into the container:

Terminal — Verify ClickHouse
# HTTP interface check
curl http://localhost:8123

# Connect via ClickHouse client in the container
docker exec -it clickhouse clickhouse-client

# Run a test query inside the client
SELECT version();

Step 4 — Load Sample Data into ClickHouse

Create a sample events table and load some data to verify the stack end-to-end. The following SQL creates a simple clickstream events table using ClickHouse's MergeTree engine — the standard table engine for most analytical use cases — and inserts sample rows:

SQL — Create and Populate Sample Events Table
-- Connect via: docker exec -it clickhouse clickhouse-client

CREATE TABLE IF NOT EXISTS analytics.events
(
    event_id     UInt64,
    event_date   Date,
    event_type   LowCardinality(String),
    user_id      UInt32,
    session_id   UInt64,
    country      LowCardinality(String),
    revenue      Decimal(10, 2)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, event_type, user_id);

-- Insert sample rows
INSERT INTO analytics.events VALUES
    (1, '2026-01-01', 'page_view',  1001, 10001, 'GB', 0.00),
    (2, '2026-01-01', 'purchase',   1001, 10001, 'GB', 49.99),
    (3, '2026-01-02', 'page_view',  1002, 10002, 'US', 0.00),
    (4, '2026-01-02', 'sign_up',    1002, 10002, 'US', 0.00),
    (5, '2026-01-03', 'purchase',   1003, 10003, 'AU', 99.00);

-- Verify
SELECT event_type, COUNT(*) as cnt, SUM(revenue) as total_revenue
FROM analytics.events
GROUP BY event_type
ORDER BY cnt DESC;

Step 5 — Connect Metabase to ClickHouse

Open Metabase at http://localhost:3000. On first access, Metabase runs a setup wizard — create an admin account with your email and a password, then proceed to the database connection step.

To add ClickHouse as a database in Metabase, navigate to Settings → Admin → Databases → Add a database. Select ClickHouse from the database type dropdown (the ClickHouse driver is included in recent Metabase versions; if not listed, it can be added as a plugin JAR from the ClickHouse Metabase driver releases). Enter the connection details:

Metabase — ClickHouse Connection Settings
Database type:    ClickHouse
Display name:     ClickHouse Analytics
Host:             clickhouse          (Docker service name — not localhost)
Port:             8123
Database name:    analytics
Username:         default
Password:         (leave blank for local dev config above)
Use a secure connection (SSL): OFF (for local dev)

Click Save. Metabase runs a connection test — on success, the ClickHouse database appears in the left panel when creating a new Question. Navigate to New → Question, select the ClickHouse Analytics database, pick the events table, and the question builder loads the table's columns for exploration. Business users can now filter by country, group by event_type, visualise revenue as a bar chart, and save the result to a dashboard — no SQL required.

When to Use This Stack vs Microsoft Fabric and Power BI

Consideration ClickHouse + Metabase (Docker) Microsoft Fabric + Power BI
Licence cost Open source — no licence cost for core stack Fabric capacity + Power BI licences required
Setup time Under 30 minutes with Docker Compose Hours to days for full enterprise setup
Query performance at scale Excellent — sub-second on billions of rows Excellent — Fabric Lakehouse + DirectLake
Enterprise governance Manual — no built-in sensitivity labels, RLS governance tooling Full — Purview integration, RLS, certifications, audit logs
Self-serve analytics Good — Metabase question builder for non-SQL users Excellent — Power BI Copilot, Explore, certified datasets
Microsoft ecosystem integration Custom connectors required Native — Teams, SharePoint, Azure AD, Office 365
Best for Startups, prototypes, event analytics, cost-constrained deployments Enterprise BI, regulated industries, multi-source cross-functional analytics

Next Steps and Production Considerations

Once the ClickHouse and Metabase Docker stack is running locally and validated with sample data, three steps make it production-ready for team use. First, add a PostgreSQL service to the Docker Compose file as Metabase's application database — Metabase's H2 built-in database is not suitable for production because it does not support concurrent connections well and loses data if the container is rebuilt without a volume. Second, set a strong ClickHouse password and configure Metabase's connection to use it — ClickHouse's HTTP interface is accessible on port 8123 and should not be exposed publicly without authentication. Third, consider reverse proxy configuration (Nginx or Traefik in Docker) to serve Metabase at a domain name with HTTPS rather than directly on port 3000.

For teams evaluating whether to build on this open-source stack long-term or to migrate to a managed analytics platform as the organisation scales, the decision point is typically governance complexity and cross-system data integration requirements. When those requirements grow beyond what ClickHouse and Metabase manage natively — sensitivity label enforcement, Power BI Copilot for business users, integration with Microsoft 365, or enterprise audit logging — Microsoft Fabric and Power BI become the more appropriate foundation. For guidance on that transition, see our post on the SAP to Microsoft Fabric integration and our overview of Fabric Data Pipelines and Fast Copy.

If your organisation is evaluating analytics infrastructure options — from open-source stacks through to enterprise Microsoft Fabric deployments — speak with a Numlytics data engineering consultant to discuss the right architecture for your scale, governance requirements, and team capabilities.