Setting up a Dask cluster using Ansible

To create a dask cluster spanning multiple machines connected via SSH communication, Dask provides the following builtin methods :

  • Manual setup using the dask-scheduler and dask-worker commands

(Detailed docs for these and other methods at https://docs.dask.org/en/latest/setup.html)

The first method above is simple, but tedious when working with a large cluster since it requires us to SSH into each of the hosts separately, and run the commands there.

The dask-ssh command is quite handy. However, it can be too simple at times. For example, if the hosts on our cluster use username/password for authentication or separate SSH keys per host, then this command falls short.

Another approach is to use one of the several popular infrastructure-automation tools (Puppet, Ansible etc.) to easily set up and tear down the dask cluster. Here, we will talk about Ansible since it is fairly popular and requires *almost* zero setup.

From here on, we assume that all the nodes have the prerequisites installed (Python, dask and dask-distributed). While this step can also be automated with Ansible, this is a one-time setup. So not much time to save here :/

Also, the scheduler is assumed to be already running on a known host (say at 11.22.33.44:8786). The steps to setup a dask-cluster would then be:

  1. Install ansible on your local machine. Ansible communicates with all the hosts using simple SSH, and requires no setup on the hosts themselves.
  • The scheduler’s address

A sample inventory file for 6 hosts (divided into 2 groups) looks like this:

In the first section, we have defined properties common to all nodes (such as the scheduler address, SSH key etc.). If we use separate SSH keys for different node, this can be done by specifying the ansible_ssh_private_key_file separately per node or per node-group.

Note the grouping being done here. This is done to logically separate our hosts if needed. This is useful when we want to handle differences in our nodes (For example, the location of dask binaries on the host, node-specific environment variables which we would like to pass to the dask-worker command etc.). Of course, this kind of grouping is optional, and all the nodes can be defined in a single group, or even separately, each with their own parameters.

3. Create a playbook to spawn the dask worker. This .yaml file contains the instructions which ansible will run on the hosts specified in the inventory. A simple playbook to run dask-workers looks like this:

Note the shell command. It runs the dask-worker command, adding all the variables we defined in the inventory as command line arguments, which can be used in our Python worker code.

4. (optional) Create another playbook to stop all the dask workers. This is useful when debugging the setup process, and avoids having to SSH into each host to kill the dask-worker. This playbook essentially greps ps for dask-workers and sends a SIGTERM signal to it.

5. Test if ansible is able to communicate with all the nodes (6 here) using:

ansible -i hosts all -m ping

Next, start the cluster using:

ansible-playbook -i hosts cluster-start.yml

If needed, test if the cluster works by running something like this on your local machine:

6. (optional) Stop all the dask-workers using:

ansible-playbook -i hosts cluster-stop.yml

Appendix

  1. If using a username/password combination instead of keys, replace the ansible_ssh_private_key_file with the ansible_ssh_pass paramenter in the inventory. It is advisable to use keys though. To generate a key, use the ssh-keygen command, and then copy the key to all the hosts using ssh-copy-id -i <key_name_from_above> <host-address> .

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store