Deploy ClickHouse 24.x Multi-Region Cluster: Amsterdam & New York VPS with Sharding and Zero-Downtime Failover (2025)

Building a distributed ClickHouse cluster across multiple regions provides exceptional data availability, improved query performance through data locality, and robust disaster recovery capabilities. In this comprehensive tutorial, we’ll deploy a production-ready ClickHouse 24.x cluster spanning Amsterdam and New York VPS instances with advanced features including sharding, ClickHouse Keeper-based replication, and zero-downtime failover mechanisms.

This setup leverages geographical distribution to minimize query latencies for users in both European and North American markets while ensuring data consistency and high availability through sophisticated replication strategies.

Prerequisites

Before beginning this deployment, ensure you have:

4 VPS instances: 2 in Amsterdam and 2 in New York (minimum 4GB RAM, 2 vCPUs each)
Ubuntu 24.04 LTS installed on all servers
Root access to all VPS instances
Network connectivity between all nodes (ports 9000, 9009, 2181, 9181 open)
Basic knowledge of SQL and distributed systems concepts
Domain names or static IPs for each node

For optimal performance in financial or trading applications, consider our Onidel VPS in Amsterdam and Onidel VPS in New York with high-performance EPYC Milan processors and low-latency connectivity to major exchanges.

Cluster Architecture Overview

Our multi-region ClickHouse deployment consists of:

Amsterdam Region: 2 ClickHouse nodes (ch-ams-01, ch-ams-02)
New York Region: 2 ClickHouse nodes (ch-ny-01, ch-ny-02)
ClickHouse Keeper: 3 nodes for coordination (distributed across regions)
Sharding Strategy: Data distribution by hash key
Replication: Cross-region replicas for disaster recovery

This architecture provides both horizontal scaling through sharding and high availability through replication, similar to patterns used in geo-sharded MongoDB deployments.

Step 1: Install ClickHouse 24.x

Install ClickHouse 24.x on all four VPS instances:

# Update system packages
sudo apt update && sudo apt upgrade -y

# Add ClickHouse GPG key and repository
curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | sudo gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg

echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list

# Install ClickHouse server and client
sudo apt update
sudo apt install -y clickhouse-server clickhouse-client

# Verify installation
clickhouse-server --version

Step 2: Deploy ClickHouse Keeper Cluster

First, configure ClickHouse Keeper on three nodes for coordination. Create the Keeper configuration:

# /etc/clickhouse-server/config.d/keeper.xml
<clickhouse>
    <keeper_server>
        <tcp_port>9181</tcp_port>
        <server_id>1</server_id>
        <log_storage_path>/var/lib/clickhouse/coordination/logs</log_storage_path>
        <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
        
        <coordination_settings>
            <raft_logs_level>information</raft_logs_level>
            <rotate_log_storage_interval>10000</rotate_log_storage_interval>
        </coordination_settings>
        
        <raft_configuration>
            <server>
                <id>1</id>
                <hostname>ch-ams-01</hostname>
                <port>9444</port>
            </server>
            <server>
                <id>2</id>
                <hostname>ch-ams-02</hostname>
                <port>9444</port>
            </server>
            <server>
                <id>3</id>
                <hostname>ch-ny-01</hostname>
                <port>9444</port>
            </server>
        </raft_configuration>
    </keeper_server>
</clickhouse>

Note: Adjust server_id to 2 and 3 for the respective nodes. Create the coordination directories:

sudo mkdir -p /var/lib/clickhouse/coordination/logs
sudo mkdir -p /var/lib/clickhouse/coordination/snapshots
sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/coordination

Step 3: Configure Multi-Region Sharding

Create the cluster configuration for distributed sharding across regions:

# /etc/clickhouse-server/config.d/clusters.xml
<clickhouse>
    <remote_servers>
        <global_cluster>
            <shard>
                <replica>
                    <host>ch-ams-01</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>ch-ny-01</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>ch-ams-02</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>ch-ny-02</host>
                    <port>9000</port>
                </replica>
            </shard>
        </global_cluster>
    </remote_servers>
    
    <zookeeper>
        <node>
            <host>ch-ams-01</host>
            <port>9181</port>
        </node>
        <node>
            <host>ch-ams-02</host>
            <port>9181</port>
        </node>
        <node>
            <host>ch-ny-01</host>
            <port>9181</port>
        </node>
    </zookeeper>
</clickhouse>

Step 4: Configure Node-Specific Settings

Configure each node with its specific macros and settings:

# /etc/clickhouse-server/config.d/macros.xml (Amsterdam nodes)
<clickhouse>
    <macros>
        <cluster>global_cluster</cluster>
        <shard>01</shard>  <!-- 01 for ch-ams-01, 02 for ch-ams-02 -->
        <replica>ch-ams-01</replica>  <!-- Adjust per node -->
        <region>amsterdam</region>
    </macros>
</clickhouse>

# /etc/clickhouse-server/config.d/macros.xml (New York nodes)
<clickhouse>
    <macros>
        <cluster>global_cluster</cluster>
        <shard>01</shard>  <!-- 01 for ch-ny-01, 02 for ch-ny-02 -->
        <replica>ch-ny-01</replica>  <!-- Adjust per node -->
        <region>newyork</region>
    </macros>
</clickhouse>

Step 5: Create Replicated Tables with Sharding

Start ClickHouse services and create distributed tables:

# Start ClickHouse on all nodes
sudo systemctl start clickhouse-server
sudo systemctl enable clickhouse-server

Create a replicated and sharded table:

-- Connect to any node and create the local table
CREATE TABLE events_local ON CLUSTER global_cluster (
    timestamp DateTime64(3),
    user_id UInt64,
    event_type String,
    region String,
    data String
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id);

-- Create distributed table for queries
CREATE TABLE events_distributed ON CLUSTER global_cluster AS events_local
ENGINE = Distributed(global_cluster, default, events_local, rand());

Step 6: Implement Zero-Downtime Failover

Configure automatic failover using ClickHouse’s built-in mechanisms and health checks:

# /etc/clickhouse-server/config.d/failover.xml
<clickhouse>
    <remote_servers>
        <global_cluster>
            <settings>
                <load_balancing>round_robin</load_balancing>
                <max_replica_delay_for_distributed_queries>300</max_replica_delay_for_distributed_queries>
            </settings>
        </global_cluster>
    </remote_servers>
</clickhouse>

Create a health monitoring script for automated failover:

#!/bin/bash
# /opt/clickhouse-monitor.sh

CLUSTER_NODES=("ch-ams-01" "ch-ams-02" "ch-ny-01" "ch-ny-02")
HEALTHY_NODES=()

for node in "${CLUSTER_NODES[@]}"; do
    if clickhouse-client --host="$node" --query="SELECT 1" &>/dev/null; then
        HEALTHY_NODES+=("$node")
        echo "$(date): $node is healthy"
    else
        echo "$(date): $node is down - triggering failover procedures"
        # Add your failover logic here
    fi
done

echo "Healthy nodes: ${HEALTHY_NODES[*]}"

Step 7: Test Cluster Functionality

Verify the cluster setup and test cross-region replication:

-- Test data insertion
INSERT INTO events_distributed VALUES 
    (now(), 1001, 'page_view', 'amsterdam', '{"page": "/home"}'),
    (now(), 1002, 'click', 'newyork', '{"button": "signup"}');

-- Verify data distribution
SELECT hostName(), count() FROM events_local GROUP BY hostName();

-- Test cross-region queries
SELECT region, count() FROM events_distributed GROUP BY region;

Best Practices

Performance Optimization

Index Strategy: Use appropriate primary keys and skip indexes based on query patterns
Compression: Enable ZSTD compression for better storage efficiency
Memory Settings: Configure max_memory_usage and max_bytes_before_external_group_by
Connection Pooling: Implement connection pooling in your applications

Security Considerations

TLS Encryption: Enable TLS for inter-node communication
Authentication: Configure user authentication and role-based access control
Network Security: Use VPN or private networks between regions
Regular Backups: Implement backup strategies using ClickHouse’s built-in backup tools

Monitoring and Maintenance

Metrics Collection: Monitor system tables like system.metrics and system.events
Query Performance: Track slow queries via system.query_log
Replication Lag: Monitor cross-region replication delays
Resource Usage: Track CPU, memory, and disk usage patterns

For comprehensive observability, consider integrating with monitoring solutions as described in our observability stack deployment guide.

Conclusion

This multi-region ClickHouse cluster deployment provides a robust foundation for high-performance analytics workloads spanning Amsterdam and New York. The configuration ensures data availability, query performance optimization through geographical distribution, and automatic failover capabilities for production resilience.

The sharding strategy distributes data effectively across regions while maintaining consistency through ClickHouse Keeper coordination. This architecture is particularly beneficial for applications requiring low-latency access from both European and North American markets, such as financial analytics, real-time monitoring, or global e-commerce platforms.

Ready to deploy your own multi-region ClickHouse cluster? Our Amsterdam VPS and New York VPS offerings provide the high-performance infrastructure needed for demanding analytical workloads, featuring EPYC Milan processors, high-availability NVMe storage, and optimized network connectivity for cross-region deployments.

Cloud VPS

Block Storage

1-Click App

Affiliates

Reseller Program

About Us

Careers