javigs82.clickhouse
Ansible Clickhouse
This role sets up and configures a Clickhouse cluster with N shards and M replicas. It is tested using apt and yum with Molecule on Vagrant.
The cluster uses Ansible inventory groups, which are called inventory patterns. The following groups are needed to run the cluster:
- clickhouse: This group includes all Clickhouse hosts. Each host must have a hostname formatted like this:
ch01-shard01-replica01
, following the pattern:^ch\\d{2}-shard\\d{2}-replica\\d{2}
- zookeeper: This group contains all Zookeeper hosts.
You can check the defaults file to understand how shards and replicas are determined based on hostnames:
Note: {{ inventory_hostname }}
refers to the DNS name or IP address, while {{ ansible_hostname }}
is the machine's hostname.
To make sure the hostname ({{ ansible_hostname }}
) is set correctly, run this command on the host machine:
hostname
Requirements
- Vagrant with VirtualBox (see https://www.vagrantup.com/intro)
- Ansible version 2.10 or higher
- Python 3
- pip3
- Molecule (see https://molecule.readthedocs.io/en/latest/installation.html)
To install molecule
, use Python 3:
python3 -m pip install --user "molecule"
python3 -m pip install --user "molecule-vagrant"
This solution assumes you will resolve hostnames using a private DNS server. Since Vagrant doesn’t provide a DNS solution, we install some software in prepare.yml
to give Vagrant an internal DNS resolver:
---
- name: Prepare
hosts: all
tasks:
- name: Install epel-release
yum:
name: epel-release
state: present
- name: Install nss-mdns
yum:
name: nss-mdns
state: present
- name: Stop cron service on Debian, if running
systemd:
name: avahi-daemon
state: started
With nss-mdns
and avahi
, Vagrant can resolve DNS at
Note: DNS resolution in real-world scenarios is usually more complex.
For hostname setup instructions, see:
https://www.vagrantup.com/docs/networking/basic_usage#setting-hostname
Architecture
The Clickhouse-cluster is based on hostnames, so ensure the hostname is set properly as follows:
^ch\\d{2}-shard\\d{2}-replica\\d{2}
Here, chX-shardY-replicaZ helps identify which replica belongs to which shard using regex.
Note: ansible_hostname
is different from inventory_hostname
:
- ansible_hostname: The name used in the OS: check by running
hostname
on the host. - inventory_hostname: The URL (IP or DNS) of the host, which must be resolvable by other cluster replicas.
Use {{ ansible_hostname }}
for discovering replicas and {{ inventory_hostname }}
for communication.
An example inventory could look like this:
[clickhouse]
<URL-ch01-shard01-replica01> ansible_host=<ip>
<URL-ch01-shard01-replica02> ansible_host=<ip>
...
<URL-ch01-shard02-replica01> ansible_host=<ip>
<URL-ch01-shard02-replica02> ansible_host=<ip>
...
<URL-chX-shardY-replicaZ> ansible_host=<ip>
[zookeeper]
<URL-zookeeper01> ansible_host=<ip>
...
<URL-zookeeperN> ansible_host=<ip>
Where URL (inventory_hostname) can be either the IP or the DNS, resolved at runtime.
For the cluster setup, both the clickhouse
and zookeeper
groups are required.
Design
- Download: From the Yandex RPM repository. Downgrades are allowed using
clickhouse_allow_downgrade
. - Config: Ensure the
clickhouse
group and user are set. Confirm paths and config files. Discover replicas and shards using regex. - Install: Download and install using
yum
. - Users: Manage a dynamic list of users (password management is not implemented).
- RBAC: To be implemented.
Role Default Variables
Check the variables in defaults.
Cluster Definition
Use these variables to define the main setup. The cluster setup depends on hostnames, so clickhouse_replica_name
and clickhouse_shard_name
are important, and clickhouse_hostname_regex
defines the hostname regex: ^ch\\d{2}-(shard\\d{2})-replica\\d{2}
. See vars for more details.
# Clickhouse cluster definition
clickhouse_version: "20.8.7.15"
clickhouse_allow_downgrade: false
clickhouse_cluster_name: "mycluster"
clickhouse_service_name: "clickhouse-server"
clickhouse_service_status: "started"
# User and group details
clickhouse_group: "clickhouse"
clickhouse_user: "clickhouse"
Clickhouse Installation Support
# Yum support
clickhouse_yum_repo: "https://repo.clickhouse.tech/rpm/stable/x86_64/"
clickhouse_yum_repo_key: "https://repo.clickhouse.tech//CLICKHOUSE-KEY.GPG"
clickhouse_yum_package:
- "clickhouse-client-{{ clickhouse_version }}"
- "clickhouse-common-static-{{ clickhouse_version }}"
- "clickhouse-server-{{ clickhouse_version }}"
# Apt support
clickhouse_apt_repo: "deb https://repo.clickhouse.tech/deb/stable/ main/"
clickhouse_apt_repo_keyserver: "keyserver.ubuntu.com"
clickhouse_apt_repo_key: "E0C56BD4"
clickhouse_apt_package:
- "clickhouse-client={{ clickhouse_version }}"
- "clickhouse-common-static={{ clickhouse_version }}"
- "clickhouse-server={{ clickhouse_version }}"
# Paths for configuration
clickhouse_path_config: "/etc/clickhouse-server"
clickhouse_path_config_d: "{{ clickhouse_path_config }}/config.d"
clickhouse_path_log: "/var/log/clickhouse-server"
clickhouse_path_data: "/var/lib/clickhouse"
Clickhouse Configuration
These variables set the main configuration.
clickhouse_config:
max_connections: 2048
keep_alive_timeout: 3
max_concurrent_queries: 100
uncompressed_cache_size: 8589934592
mark_cache_size: 5368709120
builtin_dictionaries_reload_interval: 3600
max_session_timeout: 3600
default_session_timeout: 60
mlock_status: false
merge_tree_config: []
Networking
These variables relate to networking configuration.
clickhouse_http_port: 8123
clickhouse_https_port: 8443
clickhouse_tcp_port: 9000
clickhouse_tcp_secure_port: 9440
clickhouse_interserver_http: 9009
# Variable for listener host
clickhouse_listen_host: "{{ clickhouse_listen_host_default + clickhouse_listen_host_custom }}"
Note: clickhouse_listen_host
must allow listening by Cluster members.
Users
Use these variables to customize users. For removing a user, refer to the Clickhouse config attributes: https://clickhouse.tech/docs/en/operations/configuration-files/
# Clickhouse users: https://clickhouse.tech/docs/en/operations/configuration-files/
clickhouse_users_list:
- { user_name: "default",
profile: "default",
networks: ["::/1"],
quota: "default" }
Zookeeper
The Zookeeper host list is based on inventory group patterns.
# Zookeeper is optional. If not installed, replication must be done client-side.
clickhouse_zookeeper_list: "{{ groups['zookeeper'] }}"
clickhouse_zookeeper_port: "2181"
Role Vars Variables
These are variables with higher priority than defaults and inventory group variables. They can only be changed by even higher priority variables, and typically should not be overridden.
The cluster setup depends on hostnames, making clickhouse_replica_name
and clickhouse_shard_name
important. The regex for hostnames is defined in clickhouse_hostname_regex
: ^ch\\d{2}-(shard\\d{2})-replica\\d{2}
. See vars for more info.
Check the variables in vars.
---
# Regex for discovering shards and replicas
clickhouse_hostname_regex: "^ch\\d{2}-(shard\\d{2})-(replica\\d{2})"
# Discovery using regex
clickhouse_shard_name: "{{ ansible_hostname | regex_search(clickhouse_hostname_regex, '\\1') | first }}"
clickhouse_replica_name: "{{ ansible_hostname | regex_search(clickhouse_hostname_regex, '\\2') | first }}"
# Shard list is calculated from the replica list. Must match regex: see clickhouse_hostname_regex in vars/main.yml
clickhouse_shard_list: "{{ clickhouse_replica_list | map('extract', hostvars, 'ansible_hostname') | map('regex_search', clickhouse_hostname_regex, '\\1') | unique | map ('first') }}"
clickhouse_replica_list: "{{ groups['clickhouse'] }}"
clickhouse_listen_host_default:
- "{{ inventory_hostname }}"
- "127.0.0.1"
- "::1"
These should NEVER be modified as they are essential for how the role identifies and links replicas with shards.
Role Tags
The following tags can be used in this role:
ch:configure
: Run only configuration tasks.ch:install
: Download and install the software.ch:service
: Managesystemctl
service status.
Dependencies
Clickhouse
depends on Zookeeper
for consistency.
Example Playbook
Here's an example of how to use this role:
- hosts: my_clickhouse_group
tasks:
- include_role:
name: javigs82.clickhouse
vars:
clickhouse_cluster_name: "e-commerce"
clickhouse_replica_list: "{{ groups['my_clickhouse_group'] }}"
clickhouse_zookeeper_list: "{{ groups['my_zookeeper_group'] }}"
References
- https://clickhouse.tech/docs/en/getting-started
- https://docs.ansible.com/ansible/latest/user_guide/intro_patterns.html
Author
- javigs82 GitHub
Acknowledgments
- https://github.com/AlexeySetevoi/ansible-clickhouse
- https://github.com/nl2go/ansible-role-clickhouse
- https://github.com/idealista/clickhouse_role
License
This project is licensed under the MIT License - see the LICENSE file for details.
ansible-galaxy install javigs82.clickhouse