Setting up a Dask cluster using Ansible
To create a dask cluster spanning multiple machines connected via SSH communication, Dask provides the following builtin methods :
- Manual setup using the
- Using the
dask-sshcommand, which accepts a list of hostnames and other SSH authentication options, and spawns the schdeuler and hostnames on the hosts
(Detailed docs for these and other methods at https://docs.dask.org/en/latest/setup.html)
The first method above is simple, but tedious when working with a large cluster since it requires us to SSH into each of the hosts separately, and run the commands there.
dask-ssh command is quite handy. However, it can be too simple at times. For example, if the hosts on our cluster use username/password for authentication or separate SSH keys per host, then this command falls short.
Another approach is to use one of the several popular infrastructure-automation tools (Puppet, Ansible etc.) to easily set up and tear down the dask cluster. Here, we will talk about Ansible since it is fairly popular and requires *almost* zero setup.
From here on, we assume that all the nodes have the prerequisites installed (Python, dask and dask-distributed). While this step can also be automated with Ansible, this is a one-time setup. So not much time to save here :/
Also, the scheduler is assumed to be already running on a known host (say at
18.104.22.168:8786). The steps to setup a dask-cluster would then be:
- Install ansible on your local machine. Ansible communicates with all the hosts using simple SSH, and requires no setup on the hosts themselves.
- Create an ansible inventory file (
hosts.ini). This file contains details about all the hosts (their IP addresses, path to dask binaries on the host etc.) on which we would like to spawn workers, and other variables such as:
- The scheduler’s address
- SSH authentication details (keys, username/password)
A sample inventory file for 6 hosts (divided into 2 groups) looks like this:
In the first section, we have defined properties common to all nodes (such as the scheduler address, SSH key etc.). If we use separate SSH keys for different node, this can be done by specifying the
ansible_ssh_private_key_file separately per node or per node-group.
Note the grouping being done here. This is done to logically separate our hosts if needed. This is useful when we want to handle differences in our nodes (For example, the location of dask binaries on the host, node-specific environment variables which we would like to pass to the
dask-worker command etc.). Of course, this kind of grouping is optional, and all the nodes can be defined in a single group, or even separately, each with their own parameters.
3. Create a playbook to spawn the dask worker. This .yaml file contains the instructions which ansible will run on the hosts specified in the inventory. A simple playbook to run dask-workers looks like this:
shell command. It runs the dask-worker command, adding all the variables we defined in the inventory as command line arguments, which can be used in our Python worker code.
4. (optional) Create another playbook to stop all the dask workers. This is useful when debugging the setup process, and avoids having to SSH into each host to kill the dask-worker. This playbook essentially greps
ps for dask-workers and sends a
SIGTERM signal to it.
5. Test if ansible is able to communicate with all the nodes (6 here) using:
ansible -i hosts all -m ping
Next, start the cluster using:
ansible-playbook -i hosts cluster-start.yml
If needed, test if the cluster works by running something like this on your local machine:
6. (optional) Stop all the dask-workers using:
ansible-playbook -i hosts cluster-stop.yml
- If using a username/password combination instead of keys, replace the
ansible_ssh_passparamenter in the inventory. It is advisable to use keys though. To generate a key, use the
ssh-keygencommand, and then copy the key to all the hosts using
ssh-copy-id -i <key_name_from_above> <host-address>.