javigs82.clickhouse

Ansible Clickhouse

This role sets up and configures a Clickhouse cluster with N shards and M replicas. It is tested using apt and yum with Molecule on Vagrant.

The cluster uses Ansible inventory groups, which are called inventory patterns. The following groups are needed to run the cluster:

  • clickhouse: This group includes all Clickhouse hosts. Each host must have a hostname formatted like this: ch01-shard01-replica01, following the pattern: ^ch\\d{2}-shard\\d{2}-replica\\d{2}
  • zookeeper: This group contains all Zookeeper hosts.

You can check the defaults file to understand how shards and replicas are determined based on hostnames:

Note: {{ inventory_hostname }} refers to the DNS name or IP address, while {{ ansible_hostname }} is the machine's hostname.

To make sure the hostname ({{ ansible_hostname }}) is set correctly, run this command on the host machine:

hostname

Requirements

To install molecule, use Python 3:

python3 -m pip install --user "molecule"
python3 -m pip install --user "molecule-vagrant"

This solution assumes you will resolve hostnames using a private DNS server. Since Vagrant doesn’t provide a DNS solution, we install some software in prepare.yml to give Vagrant an internal DNS resolver:

---
- name: Prepare
  hosts: all
  tasks:
    - name: Install epel-release
      yum:
        name: epel-release
        state: present

    - name: Install nss-mdns
      yum:
        name: nss-mdns
        state: present

    - name: Stop cron service on Debian, if running
      systemd:
        name: avahi-daemon
        state: started

With nss-mdns and avahi, Vagrant can resolve DNS at .local.

Note: DNS resolution in real-world scenarios is usually more complex.

For hostname setup instructions, see:

https://www.vagrantup.com/docs/networking/basic_usage#setting-hostname

Architecture

The Clickhouse-cluster is based on hostnames, so ensure the hostname is set properly as follows:

^ch\\d{2}-shard\\d{2}-replica\\d{2}

Here, chX-shardY-replicaZ helps identify which replica belongs to which shard using regex.

Note: ansible_hostname is different from inventory_hostname:

  • ansible_hostname: The name used in the OS: check by running hostname on the host.
  • inventory_hostname: The URL (IP or DNS) of the host, which must be resolvable by other cluster replicas.

Use {{ ansible_hostname }} for discovering replicas and {{ inventory_hostname }} for communication.

An example inventory could look like this:

[clickhouse]
<URL-ch01-shard01-replica01> ansible_host=<ip>
<URL-ch01-shard01-replica02> ansible_host=<ip>
...
<URL-ch01-shard02-replica01> ansible_host=<ip>
<URL-ch01-shard02-replica02> ansible_host=<ip>
...
<URL-chX-shardY-replicaZ> ansible_host=<ip>

[zookeeper]
<URL-zookeeper01> ansible_host=<ip>
...
<URL-zookeeperN> ansible_host=<ip>

Where URL (inventory_hostname) can be either the IP or the DNS, resolved at runtime.

For the cluster setup, both the clickhouse and zookeeper groups are required.

Design

  • Download: From the Yandex RPM repository. Downgrades are allowed using clickhouse_allow_downgrade.
  • Config: Ensure the clickhouse group and user are set. Confirm paths and config files. Discover replicas and shards using regex.
  • Install: Download and install using yum.
  • Users: Manage a dynamic list of users (password management is not implemented).
  • RBAC: To be implemented.

Role Default Variables

Check the variables in defaults.

Cluster Definition

Use these variables to define the main setup. The cluster setup depends on hostnames, so clickhouse_replica_name and clickhouse_shard_name are important, and clickhouse_hostname_regex defines the hostname regex: ^ch\\d{2}-(shard\\d{2})-replica\\d{2}. See vars for more details.

# Clickhouse cluster definition
clickhouse_version: "20.8.7.15"
clickhouse_allow_downgrade: false
clickhouse_cluster_name: "mycluster"

clickhouse_service_name: "clickhouse-server"
clickhouse_service_status: "started"

# User and group details
clickhouse_group: "clickhouse"
clickhouse_user: "clickhouse"

Clickhouse Installation Support

# Yum support
clickhouse_yum_repo: "https://repo.clickhouse.tech/rpm/stable/x86_64/"
clickhouse_yum_repo_key: "https://repo.clickhouse.tech//CLICKHOUSE-KEY.GPG"
clickhouse_yum_package:
  - "clickhouse-client-{{ clickhouse_version }}"
  - "clickhouse-common-static-{{ clickhouse_version }}"
  - "clickhouse-server-{{ clickhouse_version }}"

# Apt support
clickhouse_apt_repo: "deb https://repo.clickhouse.tech/deb/stable/ main/"
clickhouse_apt_repo_keyserver: "keyserver.ubuntu.com"
clickhouse_apt_repo_key: "E0C56BD4"
clickhouse_apt_package:
  - "clickhouse-client={{ clickhouse_version }}"
  - "clickhouse-common-static={{ clickhouse_version }}"
  - "clickhouse-server={{ clickhouse_version }}"

# Paths for configuration
clickhouse_path_config: "/etc/clickhouse-server"
clickhouse_path_config_d: "{{ clickhouse_path_config }}/config.d"
clickhouse_path_log: "/var/log/clickhouse-server"
clickhouse_path_data: "/var/lib/clickhouse"

Clickhouse Configuration

These variables set the main configuration.

clickhouse_config:
  max_connections: 2048
  keep_alive_timeout: 3
  max_concurrent_queries: 100
  uncompressed_cache_size: 8589934592
  mark_cache_size: 5368709120
  builtin_dictionaries_reload_interval: 3600
  max_session_timeout: 3600
  default_session_timeout: 60
  mlock_status: false
  merge_tree_config: []

Networking

These variables relate to networking configuration.

clickhouse_http_port: 8123
clickhouse_https_port: 8443
clickhouse_tcp_port: 9000
clickhouse_tcp_secure_port: 9440
clickhouse_interserver_http: 9009

# Variable for listener host
clickhouse_listen_host: "{{ clickhouse_listen_host_default + clickhouse_listen_host_custom }}"

Note: clickhouse_listen_host must allow listening by Cluster members.

Users

Use these variables to customize users. For removing a user, refer to the Clickhouse config attributes: https://clickhouse.tech/docs/en/operations/configuration-files/

# Clickhouse users: https://clickhouse.tech/docs/en/operations/configuration-files/
clickhouse_users_list:
  - { user_name: "default",
      profile: "default",
      networks: ["::/1"],
      quota: "default" }

Zookeeper

The Zookeeper host list is based on inventory group patterns.

# Zookeeper is optional. If not installed, replication must be done client-side.
clickhouse_zookeeper_list: "{{ groups['zookeeper'] }}"
clickhouse_zookeeper_port: "2181"

Role Vars Variables

These are variables with higher priority than defaults and inventory group variables. They can only be changed by even higher priority variables, and typically should not be overridden.

The cluster setup depends on hostnames, making clickhouse_replica_name and clickhouse_shard_name important. The regex for hostnames is defined in clickhouse_hostname_regex: ^ch\\d{2}-(shard\\d{2})-replica\\d{2}. See vars for more info.

Check the variables in vars.

---
# Regex for discovering shards and replicas
clickhouse_hostname_regex: "^ch\\d{2}-(shard\\d{2})-(replica\\d{2})"
# Discovery using regex
clickhouse_shard_name: "{{ ansible_hostname | regex_search(clickhouse_hostname_regex, '\\1') | first }}"
clickhouse_replica_name: "{{ ansible_hostname | regex_search(clickhouse_hostname_regex, '\\2') | first }}"

# Shard list is calculated from the replica list. Must match regex: see clickhouse_hostname_regex in vars/main.yml
clickhouse_shard_list: "{{ clickhouse_replica_list | map('extract', hostvars, 'ansible_hostname') | map('regex_search', clickhouse_hostname_regex, '\\1') | unique | map ('first') }}"
clickhouse_replica_list: "{{ groups['clickhouse'] }}"

clickhouse_listen_host_default:
  - "{{ inventory_hostname }}"
  - "127.0.0.1"
  - "::1"

These should NEVER be modified as they are essential for how the role identifies and links replicas with shards.

Role Tags

The following tags can be used in this role:

  • ch:configure: Run only configuration tasks.
  • ch:install: Download and install the software.
  • ch:service: Manage systemctl service status.

Dependencies

Clickhouse depends on Zookeeper for consistency.

Example Playbook

Here's an example of how to use this role:

    - hosts: my_clickhouse_group
      tasks:
        - include_role:
            name: javigs82.clickhouse
          vars:
            clickhouse_cluster_name: "e-commerce"
            clickhouse_replica_list: "{{ groups['my_clickhouse_group'] }}"
            clickhouse_zookeeper_list: "{{ groups['my_zookeeper_group'] }}"

References

Author

Acknowledgments

License

This project is licensed under the MIT License - see the LICENSE file for details.

Informazioni sul progetto

Clickhouse Cluster

Installa
ansible-galaxy install javigs82.clickhouse
Licenza
mit
Download
169
Proprietario