In software development there is an unavoidable trend in which applications become larger and more complex. For compiled programming languages one of the consequences is that their compilation takes longer.
Today I want to talk about using distcc to speed C/C++ compilation using different nodes in a scientific cluster.
Some programming languages, like C++, are slow to compile. Ideally, the root causes of the slowness would be attacked and we would call it a day. However, the real causes of the slowness are many and they are not trivial to solve. So as an alternative, for now, we can try to throw more resources at compiling C++.
distcc is a tool that helps us doing so by
distributing a C/C++ compilation accross several nodes accessible via a
network. Distcc has relatively low expectations about the nodes: ideally you only
need the same compiler installed everywhere. This is because the default operation
mode of distcc is based on distributing the preprocessed files. This works
but we can do a bit better if we are able to also preprocess distributedly, distcc
calls this the
Scientific clusters are designed to execute applications that need lots of computational resources. As such they are usually structured as one or more login nodes and a set of computational nodes. Users can connect to login nodes but can only access computational nodes after they have allocated the resources. The allocation is requested from the login node. A common resource manager to do that is Slurm.
Example: compiling LLVM on 4 nodes
distcc is already installed in
<install-distcc>. It is not
difficult to install from source. Make sure the installation directory is in
$ export PATH=<install-distcc>/bin:$PATH
Set up distcc
In general, using login nodes for anything other than allocating computational resources is frowned upon. So we will request 5 nodes. One of them, the main, will be used to orchestrate the compilation, and the other 4 will be used for the compilation itself.
For this example I’m going to use Slurm. Even if your cluster is using Slurm too, its site-configuration may be different and there may be small operational differences. Check your site documentation.
First of all, let’s request the resources using
salloc command from Slurm.
$ salloc --qos=debug --time=01:00:00 -N 5 salloc: Pending job allocation 12345678 salloc: job 12345678 queued and waiting for resources salloc: job 12345678 has been allocated resources salloc: Granted job allocation 12345678 salloc: Waiting for resource configuration salloc: Nodes z04r1b64,z05r2b[38-39],z09r2b56,z13r1b01 are ready for job
In the cluster I’m using,
debug is a special partition meant to do debugging
or compiling applications. I’m requesting 5 nodes for 1 hour.
The allocation is often fulfilled quickly (but this depends on the level of utilisation of the cluster, which often correlates with deadlines!).
The cluster I’m using automatically logins you to the first allocated node
z04r1b64). Some other clusters may require you to do
We need to make sure distcc will allow us to use the compiler we plan to use.
So edit the file
<install-distcc>/etc/distcc/commands.allow.sh and add
the full path to the compiler you want to use. In my case I will be using
installed in a non-default path.
Now we are going to use
start the distcc daemon on the other nodes.
$ clush -w "z05r2b[38-39],z09r2b56,z13r1b01" \ <install-distcc>/bin/distccd --daemon --allow 10.0.0.0/8 --enable-tcp-insecure
clush understands Slurm’s nodeset notation, so we can just use it
directly (but make sure you do not pass the main node). We request it to be a
--daemon and allow all the nodes of the private LAN used by the
cluster. The flag
--enable-tcp-insecure is required because, for simplicity,
we will not use the masquerade feature of distcc. This is a safety feature
that should be considered later.
At this point we must setup the
DISTCC_HOSTS environment variable.
distcc cannot use Slurm’s nodeset notation. A tool called
clush itself) will come handy here.
$ nodeset -e z04r1b64,z05r2b[38-39],z09r2b56,s13r1b01 z05r2b38 z05r2b39 z04r1b64 z09r2b56 s13r1b01
DISTCC_HOSTS has the following syntax for each host:
host/limit,options. We are going to use as options
compressing the files during network transfers and
cpp will allow us to
enable the pump mode. Each of our nodes has 48 cores, so let’s use that as a
limit (distcc’s default is 4 concurrent jobs). Also let’s not forget to remove
the main node. We can use the following script.
$ source set_distcc_hosts.sh $ echo $DISTCC_HOSTS z05r2b3/48,lzo,cpp z04r1b6/48,lzo,cpp z09r2b5/48,lzo,cpp s13r1b0/48,lzo,cpp
Now we are ready to build.
In a separate build directory, invoke cmake as usual. Make sure you specify
the compiler you want to use. The easiest way to do this is setting
We will compile only
clang in this example.
# Source of llvm is in ./llvm-src $ mkdir llvm-build $ cd llvm-build $ CC="/apps/LLVM/12.0.1/GCC/bin/clang" CXX="/apps/LLVM/12.0.1/GCC/bin/clang++" \ cmake -G Ninja ../llvm-src/llvm \ -DCMAKE_BUILD_TYPE=Release "-DLLVM_ENABLE_PROJECTS=clang" \ -DLLVM_ENABLE_LLD=ON -DCMAKE_RANLIB=$(which llvm-ranlib) -DCMAKE_AR=$(which llvm-ar) -DLLVM_PARALLEL_LINK_JOBS=48 \ -DCMAKE_C_COMPILER_LAUNCHER=distcc -DCMAKE_CXX_COMPILER_LAUNCHER=distcc
The important flags here are
-DCMAKE_CXX_COMPILER_LAUNCHER=distcc. The build system will use these when
building (but not during configuration, which is convenient). The other flags
are just to ensure a release build and force the build system to use
to run (locally) more than 48 link jobs,
llvm-ar from LLVM
which are faster than the usual GNU counterparts.
The cmake invocation should complete successfully.
Before we continue, we must make sure the variables
CPLUS_INCLUDE_PATH are not set, otherwise the pump mode will refuse to
$ unset CPATH C_INCLUDE_PATH CPLUS_INCLUDE_PATH
Now we can invoke
ninja but telling it that it uses 192 (= 48 × 4)
concurrent jobs. We have to use
pump to enable distcc’s pump mode.
$ time pump ninja -j$((48 * 4)) __________Using distcc-pump from <install-distcc>/bin __________Using 4 distcc servers in pump mode [4382/4382] Linking CXX executable bin/c-index-test __________Shutting down distcc-pump include server real 3m0.985s user 4m42.380s sys 2m56.167s
3 minutes to compile clang+LLVM in Release mode is not bad 😀