Distributed compilation in a cluster
In software development there is an unavoidable trend in which applications become larger and more complex. For compiled programming languages one of the consequences is that their compilation takes longer.
Today I want to talk about using distcc to speed C/C++ compilation using different nodes in a scientific cluster.
Distributed compilation
Some programming languages, like C++, are slow to compile. Ideally, the root causes of the slowness would be attacked and we would call it a day. However, the real causes of the slowness are many and they are not trivial to solve. So as an alternative, for now, we can try to throw more resources at compiling C++.
distcc is a tool that helps us doing so by
distributing a C/C++ compilation accross several nodes accessible via a
network. Distcc has relatively low expectations about the nodes: ideally you only
need the same compiler installed everywhere. This is because the default operation
mode of distcc is based on distributing the preprocessed files. This works
but we can do a bit better if we are able to also preprocess distributedly, distcc
calls this the pump
mode.
Scientific clusters
Scientific clusters are designed to execute applications that need lots of computational resources. As such they are usually structured as one or more login nodes and a set of computational nodes. Users can connect to login nodes but can only access computational nodes after they have allocated the resources. The allocation is requested from the login node. A common resource manager to do that is Slurm.
Example: compiling LLVM on 4 nodes
Install distcc
I’ll assume distcc
is already installed in <install-distcc>
. It is not
difficult to install from source. Make sure the installation directory is in
the PATH
.
$ export PATH=<install-distcc>/bin:$PATH
Set up distcc
In general, using login nodes for anything other than allocating computational resources is frowned upon. So we will request 5 nodes. One of them, the main, will be used to orchestrate the compilation, and the other 4 will be used for the compilation itself.
For this example I’m going to use Slurm. Even if your cluster is using Slurm too, its site-configuration may be different and there may be small operational differences. Check your site documentation.
First of all, let’s request the resources using salloc
command from Slurm.
$ salloc --qos=debug --time=01:00:00 -N 5
salloc: Pending job allocation 12345678
salloc: job 12345678 queued and waiting for resources
salloc: job 12345678 has been allocated resources
salloc: Granted job allocation 12345678
salloc: Waiting for resource configuration
salloc: Nodes z04r1b64,z05r2b[38-39],z09r2b56,z13r1b01 are ready for job
In the cluster I’m using, debug
is a special partition meant to do debugging
or compiling applications. I’m requesting 5 nodes for 1 hour.
The allocation is often fulfilled quickly (but this depends on the level of utilisation of the cluster, which often correlates with deadlines!).
The cluster I’m using automatically logins you to the first allocated node
(z04r1b64
). Some other clusters may require you to do ssh
first.
We need to make sure distcc will allow us to use the compiler we plan to use.
So edit the file <install-distcc>/etc/distcc/commands.allow.sh
and add
the full path to the compiler you want to use. In my case I will be using clang
installed in a non-default path.
Now we are going to use
clush
to
start the distcc daemon on the other nodes.
$ clush -w "z05r2b[38-39],z09r2b56,z13r1b01" \
<install-distcc>/bin/distccd --daemon --allow 10.0.0.0/8 --enable-tcp-insecure
Luckily, clush
understands Slurm’s nodeset notation, so we can just use it
directly (but make sure you do not pass the main node). We request it to be a
daemon --daemon
and allow all the nodes of the private LAN used by the
cluster. The flag --enable-tcp-insecure
is required because, for simplicity,
we will not use the masquerade feature of distcc. This is a safety feature
that should be considered later.
At this point we must setup the DISTCC_HOSTS
environment variable.
Unfortunately distcc
cannot use Slurm’s nodeset notation. A tool called
nodeset
(from clush
itself) will come handy here.
$ nodeset -e z04r1b64,z05r2b[38-39],z09r2b56,s13r1b01
z05r2b38 z05r2b39 z04r1b64 z09r2b56 s13r1b01
The variable DISTCC_HOSTS
has the following syntax for each host:
host/limit,options
. We are going to use as options lzo,cpp
. lzo
means
compressing the files during network transfers and cpp
will allow us to
enable the pump mode. Each of our nodes has 48 cores, so let’s use that as a
limit (distcc’s default is 4 concurrent jobs). Also let’s not forget to remove
the main node. We can use the following script.
$ source set_distcc_hosts.sh
$ echo $DISTCC_HOSTS
z05r2b3/48,lzo,cpp z04r1b6/48,lzo,cpp z09r2b5/48,lzo,cpp s13r1b0/48,lzo,cpp
Now we are ready to build.
LLVM’s cmake
In a separate build directory, invoke cmake as usual. Make sure you specify
the compiler you want to use. The easiest way to do this is setting CC
and CXX
.
We will compile only clang
in this example.
# Source of llvm is in ./llvm-src
$ mkdir llvm-build
$ cd llvm-build
$ CC="/apps/LLVM/12.0.1/GCC/bin/clang" CXX="/apps/LLVM/12.0.1/GCC/bin/clang++" \
cmake -G Ninja ../llvm-src/llvm \
-DCMAKE_BUILD_TYPE=Release "-DLLVM_ENABLE_PROJECTS=clang" \
-DLLVM_ENABLE_LLD=ON -DCMAKE_RANLIB=$(which llvm-ranlib)
-DCMAKE_AR=$(which llvm-ar) -DLLVM_PARALLEL_LINK_JOBS=48 \
-DCMAKE_C_COMPILER_LAUNCHER=distcc -DCMAKE_CXX_COMPILER_LAUNCHER=distcc
The important flags here are -DCMAKE_C_COMPILER_LAUNCHER=distcc
and
-DCMAKE_CXX_COMPILER_LAUNCHER=distcc
. The build system will use these when
building (but not during configuration, which is convenient). The other flags
are just to ensure a release build and force the build system to use lld
, not
to run (locally) more than 48 link jobs, llvm-ranlib
and llvm-ar
from LLVM
which are faster than the usual GNU counterparts.
The cmake invocation should complete successfully.
Before we continue, we must make sure the variables CPATH
, C_INCLUDE_PATH
and CPLUS_INCLUDE_PATH
are not set, otherwise the pump mode will refuse to
work.
$ unset CPATH C_INCLUDE_PATH CPLUS_INCLUDE_PATH
Now we can invoke ninja
but telling it that it uses 192 (= 48 × 4)
concurrent jobs. We have to use pump
to enable distcc’s pump mode.
$ time pump ninja -j$((48 * 4))
__________Using distcc-pump from <install-distcc>/bin
__________Using 4 distcc servers in pump mode
[4382/4382] Linking CXX executable bin/c-index-test
__________Shutting down distcc-pump include server
real 3m0.985s
user 4m42.380s
sys 2m56.167s
3 minutes to compile clang+LLVM in Release mode is not bad 😀