In software development there is an unavoidable trend in which applications become larger and more complex. For compiled programming languages one of the consequences is that their compilation takes longer.

Today I want to talk about using distcc to speed C/C++ compilation using different nodes in a scientific cluster.

Distributed compilation

Some programming languages, like C++, are slow to compile. Ideally, the root causes of the slowness would be attacked and we would call it a day. However, the real causes of the slowness are many and they are not trivial to solve. So as an alternative, for now, we can try to throw more resources at compiling C++.

distcc is a tool that helps us doing so by distributing a C/C++ compilation accross several nodes accessible via a network. Distcc has relatively low expectations about the nodes: ideally you only need the same compiler installed everywhere. This is because the default operation mode of distcc is based on distributing the preprocessed files. This works but we can do a bit better if we are able to also preprocess distributedly, distcc calls this the pump mode.

Scientific clusters

Scientific clusters are designed to execute applications that need lots of computational resources. As such they are usually structured as one or more login nodes and a set of computational nodes. Users can connect to login nodes but can only access computational nodes after they have allocated the resources. The allocation is requested from the login node. A common resource manager to do that is Slurm.

Example: compiling LLVM on 4 nodes

Install distcc

I’ll assume distcc is already installed in <install-distcc>. It is not difficult to install from source. Make sure the installation directory is in the PATH.

$ export PATH=<install-distcc>/bin:$PATH

Set up distcc

In general, using login nodes for anything other than allocating computational resources is frowned upon. So we will request 5 nodes. One of them, the main, will be used to orchestrate the compilation, and the other 4 will be used for the compilation itself.

For this example I’m going to use Slurm. Even if your cluster is using Slurm too, its site-configuration may be different and there may be small operational differences. Check your site documentation.

First of all, let’s request the resources using salloc command from Slurm.

$ salloc --qos=debug --time=01:00:00 -N 5
salloc: Pending job allocation 12345678
salloc: job 12345678 queued and waiting for resources
salloc: job 12345678 has been allocated resources
salloc: Granted job allocation 12345678
salloc: Waiting for resource configuration
salloc: Nodes z04r1b64,z05r2b[38-39],z09r2b56,z13r1b01 are ready for job

In the cluster I’m using, debug is a special partition meant to do debugging or compiling applications. I’m requesting 5 nodes for 1 hour.

The allocation is often fulfilled quickly (but this depends on the level of utilisation of the cluster, which often correlates with deadlines!).

The cluster I’m using automatically logins you to the first allocated node (z04r1b64). Some other clusters may require you to do ssh first.

We need to make sure distcc will allow us to use the compiler we plan to use. So edit the file <install-distcc>/etc/distcc/commands.allow.sh and add the full path to the compiler you want to use. In my case I will be using clang installed in a non-default path.

<install-distcc>/etc/distcc/commands.allow.sh

allowed_compilers="
  /usr/bin/cc
  /usr/bin/c++
  /usr/bin/c89
  /usr/bin/c99
  /usr/bin/gcc
  /usr/bin/g++
  /usr/bin/*gcc-*
  /usr/bin/*g++-*
  /apps/LLVM/12.0.1/GCC/bin/clang
  /apps/LLVM/12.0.1/GCC/bin/clang++
"

Now we are going to use clush to start the distcc daemon on the other nodes.

$ clush -w "z05r2b[38-39],z09r2b56,z13r1b01" \
        <install-distcc>/bin/distccd --daemon --allow 10.0.0.0/8 --enable-tcp-insecure

Luckily, clush understands Slurm’s nodeset notation, so we can just use it directly (but make sure you do not pass the main node). We request it to be a daemon --daemon and allow all the nodes of the private LAN used by the cluster. The flag --enable-tcp-insecure is required because, for simplicity, we will not use the masquerade feature of distcc. This is a safety feature that should be considered later.

At this point we must setup the DISTCC_HOSTS environment variable. Unfortunately distcc cannot use Slurm’s nodeset notation. A tool called nodeset (from clush itself) will come handy here.

$ nodeset -e z04r1b64,z05r2b[38-39],z09r2b56,s13r1b01
z05r2b38 z05r2b39 z04r1b64 z09r2b56 s13r1b01

The variable DISTCC_HOSTS has the following syntax for each host: host/limit,options. We are going to use as options lzo,cpp. lzo means compressing the files during network transfers and cpp will allow us to enable the pump mode. Each of our nodes has 48 cores, so let’s use that as a limit (distcc’s default is 4 concurrent jobs). Also let’s not forget to remove the main node. We can use the following script.

setup_distcc.sh

#!/usr/bin/env bash

CURRENT=$(hostname)
DISTCC_HOSTS=""
NODES="$(nodeset -e ${SLURM_NODELIST})"

for i in ${NODES}
do
  # Ignore current host.
  if [ "$i" = "${CURRENT}" ];
  then
   continue;
  fi;
  if [ -n "${DISTCC_HOSTS}" ];
  then
    # Add separator.
    DISTCC_HOSTS+=" "
  fi
  DISTCC_HOSTS+="$i/48,lzo,cpp"
done

export DISTCC_HOSTS

$ source set_distcc_hosts.sh
$ echo $DISTCC_HOSTS
z05r2b3/48,lzo,cpp z04r1b6/48,lzo,cpp z09r2b5/48,lzo,cpp s13r1b0/48,lzo,cpp

Now we are ready to build.

LLVM’s cmake

In a separate build directory, invoke cmake as usual. Make sure you specify the compiler you want to use. The easiest way to do this is setting CC and CXX.

We will compile only clang in this example.

# Source of llvm is in ./llvm-src
$ mkdir llvm-build
$ cd llvm-build
$ CC="/apps/LLVM/12.0.1/GCC/bin/clang" CXX="/apps/LLVM/12.0.1/GCC/bin/clang++" \
  cmake -G Ninja ../llvm-src/llvm \
  -DCMAKE_BUILD_TYPE=Release "-DLLVM_ENABLE_PROJECTS=clang" \
  -DLLVM_ENABLE_LLD=ON -DCMAKE_RANLIB=$(which llvm-ranlib)
  -DCMAKE_AR=$(which llvm-ar) -DLLVM_PARALLEL_LINK_JOBS=48 \
  -DCMAKE_C_COMPILER_LAUNCHER=distcc -DCMAKE_CXX_COMPILER_LAUNCHER=distcc

The important flags here are -DCMAKE_C_COMPILER_LAUNCHER=distcc and -DCMAKE_CXX_COMPILER_LAUNCHER=distcc. The build system will use these when building (but not during configuration, which is convenient). The other flags are just to ensure a release build and force the build system to use lld, not to run (locally) more than 48 link jobs, llvm-ranlib and llvm-ar from LLVM which are faster than the usual GNU counterparts.

The cmake invocation should complete successfully.

Before we continue, we must make sure the variables CPATH, C_INCLUDE_PATH and CPLUS_INCLUDE_PATH are not set, otherwise the pump mode will refuse to work.

$ unset CPATH C_INCLUDE_PATH CPLUS_INCLUDE_PATH

Now we can invoke ninja but telling it that it uses 192 (= 48 × 4) concurrent jobs. We have to use pump to enable distcc’s pump mode.

$ time pump ninja -j$((48 * 4))
__________Using distcc-pump from <install-distcc>/bin
__________Using 4 distcc servers in pump mode
[4382/4382] Linking CXX executable bin/c-index-test
__________Shutting down distcc-pump include server

real	3m0.985s
user	4m42.380s
sys	2m56.167s

3 minutes to compile clang+LLVM in Release mode is not bad 😀