javigs82.clickhouse
Ansible Clickhouse
This role sets up and configures a Clickhouse cluster with N shards and M replicas. It is tested using apt and yum with Molecule on Vagrant.
The cluster uses Ansible inventory groups, which are called inventory patterns. The following groups are needed to run the cluster:
- clickhouse: This group includes all Clickhouse hosts. Each host must have a hostname formatted like this: ch01-shard01-replica01, following the pattern:^ch\\d{2}-shard\\d{2}-replica\\d{2}
- zookeeper: This group contains all Zookeeper hosts.
You can check the defaults file to understand how shards and replicas are determined based on hostnames:
Note: {{ inventory_hostname }} refers to the DNS name or IP address, while {{ ansible_hostname }} is the machine's hostname.
To make sure the hostname ({{ ansible_hostname }}) is set correctly, run this command on the host machine:
hostname
Requirements
- Vagrant with VirtualBox (see https://www.vagrantup.com/intro)
- Ansible version 2.10 or higher
- Python 3
- pip3
- Molecule (see https://molecule.readthedocs.io/en/latest/installation.html)
To install molecule, use Python 3:
python3 -m pip install --user "molecule"
python3 -m pip install --user "molecule-vagrant"
This solution assumes you will resolve hostnames using a private DNS server. Since Vagrant doesn’t provide a DNS solution, we install some software in prepare.yml to give Vagrant an internal DNS resolver:
---
- name: Prepare
  hosts: all
  tasks:
    - name: Install epel-release
      yum:
        name: epel-release
        state: present
    - name: Install nss-mdns
      yum:
        name: nss-mdns
        state: present
    - name: Stop cron service on Debian, if running
      systemd:
        name: avahi-daemon
        state: started
With nss-mdns and avahi, Vagrant can resolve DNS at 
Note: DNS resolution in real-world scenarios is usually more complex.
For hostname setup instructions, see:
https://www.vagrantup.com/docs/networking/basic_usage#setting-hostname
Architecture
The Clickhouse-cluster is based on hostnames, so ensure the hostname is set properly as follows:
^ch\\d{2}-shard\\d{2}-replica\\d{2}
Here, chX-shardY-replicaZ helps identify which replica belongs to which shard using regex.
Note: ansible_hostname is different from inventory_hostname:
- ansible_hostname: The name used in the OS: check by running hostnameon the host.
- inventory_hostname: The URL (IP or DNS) of the host, which must be resolvable by other cluster replicas.
Use {{ ansible_hostname }} for discovering replicas and {{ inventory_hostname }} for communication.
An example inventory could look like this:
[clickhouse]
<URL-ch01-shard01-replica01> ansible_host=<ip>
<URL-ch01-shard01-replica02> ansible_host=<ip>
...
<URL-ch01-shard02-replica01> ansible_host=<ip>
<URL-ch01-shard02-replica02> ansible_host=<ip>
...
<URL-chX-shardY-replicaZ> ansible_host=<ip>
[zookeeper]
<URL-zookeeper01> ansible_host=<ip>
...
<URL-zookeeperN> ansible_host=<ip>
Where URL (inventory_hostname) can be either the IP or the DNS, resolved at runtime.
For the cluster setup, both the clickhouse and zookeeper groups are required.
Design
- Download: From the Yandex RPM repository. Downgrades are allowed using clickhouse_allow_downgrade.
- Config: Ensure the clickhousegroup and user are set. Confirm paths and config files. Discover replicas and shards using regex.
- Install: Download and install using yum.
- Users: Manage a dynamic list of users (password management is not implemented).
- RBAC: To be implemented.
Role Default Variables
Check the variables in defaults.
Cluster Definition
Use these variables to define the main setup. The cluster setup depends on hostnames, so clickhouse_replica_name and clickhouse_shard_name are important, and clickhouse_hostname_regex defines the hostname regex: ^ch\\d{2}-(shard\\d{2})-replica\\d{2}. See vars for more details.
# Clickhouse cluster definition
clickhouse_version: "20.8.7.15"
clickhouse_allow_downgrade: false
clickhouse_cluster_name: "mycluster"
clickhouse_service_name: "clickhouse-server"
clickhouse_service_status: "started"
# User and group details
clickhouse_group: "clickhouse"
clickhouse_user: "clickhouse"
Clickhouse Installation Support
# Yum support
clickhouse_yum_repo: "https://repo.clickhouse.tech/rpm/stable/x86_64/"
clickhouse_yum_repo_key: "https://repo.clickhouse.tech//CLICKHOUSE-KEY.GPG"
clickhouse_yum_package:
  - "clickhouse-client-{{ clickhouse_version }}"
  - "clickhouse-common-static-{{ clickhouse_version }}"
  - "clickhouse-server-{{ clickhouse_version }}"
# Apt support
clickhouse_apt_repo: "deb https://repo.clickhouse.tech/deb/stable/ main/"
clickhouse_apt_repo_keyserver: "keyserver.ubuntu.com"
clickhouse_apt_repo_key: "E0C56BD4"
clickhouse_apt_package:
  - "clickhouse-client={{ clickhouse_version }}"
  - "clickhouse-common-static={{ clickhouse_version }}"
  - "clickhouse-server={{ clickhouse_version }}"
# Paths for configuration
clickhouse_path_config: "/etc/clickhouse-server"
clickhouse_path_config_d: "{{ clickhouse_path_config }}/config.d"
clickhouse_path_log: "/var/log/clickhouse-server"
clickhouse_path_data: "/var/lib/clickhouse"
Clickhouse Configuration
These variables set the main configuration.
clickhouse_config:
  max_connections: 2048
  keep_alive_timeout: 3
  max_concurrent_queries: 100
  uncompressed_cache_size: 8589934592
  mark_cache_size: 5368709120
  builtin_dictionaries_reload_interval: 3600
  max_session_timeout: 3600
  default_session_timeout: 60
  mlock_status: false
  merge_tree_config: []
Networking
These variables relate to networking configuration.
clickhouse_http_port: 8123
clickhouse_https_port: 8443
clickhouse_tcp_port: 9000
clickhouse_tcp_secure_port: 9440
clickhouse_interserver_http: 9009
# Variable for listener host
clickhouse_listen_host: "{{ clickhouse_listen_host_default + clickhouse_listen_host_custom }}"
Note: clickhouse_listen_host must allow listening by Cluster members.
Users
Use these variables to customize users. For removing a user, refer to the Clickhouse config attributes: https://clickhouse.tech/docs/en/operations/configuration-files/
# Clickhouse users: https://clickhouse.tech/docs/en/operations/configuration-files/
clickhouse_users_list:
  - { user_name: "default",
      profile: "default",
      networks: ["::/1"],
      quota: "default" }
Zookeeper
The Zookeeper host list is based on inventory group patterns.
# Zookeeper is optional. If not installed, replication must be done client-side.
clickhouse_zookeeper_list: "{{ groups['zookeeper'] }}"
clickhouse_zookeeper_port: "2181"
Role Vars Variables
These are variables with higher priority than defaults and inventory group variables. They can only be changed by even higher priority variables, and typically should not be overridden.
The cluster setup depends on hostnames, making clickhouse_replica_name and clickhouse_shard_name important. The regex for hostnames is defined in clickhouse_hostname_regex: ^ch\\d{2}-(shard\\d{2})-replica\\d{2}. See vars for more info.
Check the variables in vars.
---
# Regex for discovering shards and replicas
clickhouse_hostname_regex: "^ch\\d{2}-(shard\\d{2})-(replica\\d{2})"
# Discovery using regex
clickhouse_shard_name: "{{ ansible_hostname | regex_search(clickhouse_hostname_regex, '\\1') | first }}"
clickhouse_replica_name: "{{ ansible_hostname | regex_search(clickhouse_hostname_regex, '\\2') | first }}"
# Shard list is calculated from the replica list. Must match regex: see clickhouse_hostname_regex in vars/main.yml
clickhouse_shard_list: "{{ clickhouse_replica_list | map('extract', hostvars, 'ansible_hostname') | map('regex_search', clickhouse_hostname_regex, '\\1') | unique | map ('first') }}"
clickhouse_replica_list: "{{ groups['clickhouse'] }}"
clickhouse_listen_host_default:
  - "{{ inventory_hostname }}"
  - "127.0.0.1"
  - "::1"
These should NEVER be modified as they are essential for how the role identifies and links replicas with shards.
Role Tags
The following tags can be used in this role:
- ch:configure: Run only configuration tasks.
- ch:install: Download and install the software.
- ch:service: Manage- systemctlservice status.
Dependencies
Clickhouse depends on Zookeeper for consistency.
Example Playbook
Here's an example of how to use this role:
    - hosts: my_clickhouse_group
      tasks:
        - include_role:
            name: javigs82.clickhouse
          vars:
            clickhouse_cluster_name: "e-commerce"
            clickhouse_replica_list: "{{ groups['my_clickhouse_group'] }}"
            clickhouse_zookeeper_list: "{{ groups['my_zookeeper_group'] }}"
References
- https://clickhouse.tech/docs/en/getting-started
- https://docs.ansible.com/ansible/latest/user_guide/intro_patterns.html
Author
- javigs82 GitHub
Acknowledgments
- https://github.com/AlexeySetevoi/ansible-clickhouse
- https://github.com/nl2go/ansible-role-clickhouse
- https://github.com/idealista/clickhouse_role
License
This project is licensed under the MIT License - see the LICENSE file for details.
ansible-galaxy install javigs82.clickhouse