Sometimes I find myself running testsuites that typically, in order to make the most of the several cores available in the system, spawn many processes so the tests can run in parallel. This allows running the testsuites much faster.

One side-effect, though, of these mechanisms is that they may not be able to handle correctly cancellation, say pressing Ctrl-C.

Today we are going to see a way to mitigate this problem using systemd-run.

Systemd

Systemd is the system and service manager used in Linux these days in replacement of existing solutions based on shell scripts. In contrast to loosely coupled scripts, systemd is a more integrated solution. In that sense it has pros and cons but the former seem to outweigh the latter and most Linux distributions have migrated to use systemd.

Systemd uses the concept of units, of which there are different kinds, and we are interested in the service unit type.

Typically units are described by files on the disk so we can start, stop, etc. using the systemctl command.

systemd-run

The tool systemd-run allows us to create service units on the fly for ad-hoc purposes. By default systemd-run will try to use the global (system-wide) systemd session, but we can tell it to use the systemd session created when the user logged on (e.g. via ssh) using the command option --user.

One interesting flag is the --shell flag, which allows us to run $SHELL as a systemd service. This means that systemd is in control of the processes created in there.

$ systemd-run --user --shell
Running as unit: run-u100.service
Press ^] three times within 1s to disconnect TTY.
$ uname -a
Linux mybox 6.1.0-17-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux
$ exit
exit
Finished with result: success
Main processes terminated with: code=exited/status=0
Service runtime: 2.715s
CPU time consumed: 10ms

The flag --shell according the documentation is a shortcut for the command options --pty --same-dir --wait --collect --service-type=exec $SHELL.

Use case

As part of my dayjob I often run the LLVM unit and regression tests. Once we have built LLVM, along with other projects such as clang, flang and lld, there is a target in the build system called check. Check will build the necessary infrastructure for unit tests and invoke lit

# Build LLVM and all the projects
user:~/llvm-build$ cmake --build .
# Run the unit and regression tests
user:~/llvm-build$ cmake --build . --target check

lit is implemented in Python and in order to exploit parallelism uses the multiprocessing module. Unfortunately if for some reason you need to cancel early the testsuite execution (e.g., you realised you forgot to add a test), say, pressing Ctrl-C, if your machine has lots of threads, you will end with a large number of runaway processes. This is easy to observe when LLVM is build in Debug mode as everything runs much slower, including tests. I have not dug further but I assume this is a limitation of the multiprocessing module.

Following is an example of what typically happens if we press Ctrl-C on a machine with 16 cores (32 threads):

user:~/llvm-build$ cmake --build . --target check
[2/3] cd /home/user/soft/llvm-build... /usr/bin/python3 -m unittest discover
.................................................................................................................................
----------------------------------------------------------------------
Ran 129 tests in 1.403s

OK
[2/3] Running all regression tests
llvm-lit: /home/user/llvm-src/llvm/utils/lit/lit/llvm/config.py:488: note: using clang: /home/user/llvm-build/bin/clang
^C  interrupted by user, skipping remaining tests

Testing Time: 4.53s

Total Discovered Tests: 74509
  Skipped: 74509 (100.00%)
ninja: build stopped: interrupted by user.

If right after cancelling we check ps -x -f, we will see a large number of processes that have been detached from the lit process.

user:~/llvm-build$ ps -x -f
  …
  16574 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-global-agent.ll.script
  16575 pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx600 -verify-machineinstrs
  16576 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX6 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll
  16577 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-local-singlethread.ll.script
  16578 pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx600 -verify-machineinstrs
  16579 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX6 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-singlethread.ll
  16580 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/sched-group-barrier-pipeline-solver.mir.script
  16612 pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -march=amdgcn -mcpu=gfx908 -amdgpu-igrouplp-exact-solver -run-pass=machine-scheduler -o - /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
  16613 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck -check-prefix=EXACT /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
  16583 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-global-system.ll.script
  16584 pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx600 -verify-machineinstrs
  16585 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX6 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-system.ll
  16586 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-flat-agent.ll.script
  16587 pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs
  16588 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX7 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll
  16590 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-flat-singlethread.ll.script
  16591 pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs
  16592 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX7 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-singlethread.ll
  16593 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-flat-system.ll.script
  16594 pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs
  16595 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX7 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-system.ll
  16596 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-flat-wavefront.ll.script
  16597 pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs
  16598 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX7 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-wavefront.ll
  16600 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/tools/clang/test/CodeGen/X86/Output/x86_64-xsave.c.script
  16658 pts/2    R      0:04  |   \_ /home/user/llvm-build/bin/clang -cc1 -internal-isystem /home/user/llvm-build/lib/clang/18/include -nostdsysteminc /home/user/llvm-src/clang/test/CodeGen/X86/x86_64-xsave.c -DTEST_XSAVE -O0 
  16659 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck /home/user/llvm-src/clang/test/CodeGen/X86/x86_64-xsave.c --check-prefix=XSAVE
  16603 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-flat-workgroup.ll.script
  16607 pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs
  16608 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX7 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll
  16609 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/tools/clang/test/CodeGen/X86/Output/rot-intrinsics.c.script
  16646 pts/2    R      0:05  |   \_ /home/user/llvm-build/bin/clang -cc1 -internal-isystem /home/user/llvm-build/lib/clang/18/include -nostdsysteminc -x c -ffreestanding -triple x86_64--linux -no-enable-noundef-analysis -emit-llvm /home/roge
  16647 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck /home/user/llvm-src/clang/test/CodeGen/X86/rot-intrinsics.c --check-prefixes CHECK,CHECK-64BIT-LONG
  16621 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/tools/clang/test/Headers/Output/opencl-builtins.cl.script
  16642 pts/2    R      0:09  |   \_ /home/user/llvm-build/bin/clang -cc1 -internal-isystem /home/user/llvm-build/lib/clang/18/include -nostdsysteminc -include /home/user/llvm-src/clang/test/Headers/opencl-builtins.cl /home/ro
  16622 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/tools/clang/test/CodeGen/PowerPC/Output/ppc-smmintrin.c.script
  16652 pts/2    R      0:04  |   \_ /home/user/llvm-build/bin/clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS /home/user/llvm-src/clang/test/CodeGen/PowerPC/ppc-smmintrin.c -fno-discard-
  16623 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/tools/clang/test/CodeGen/X86/Output/x86_32-xsave.c.script
  16656 pts/2    R      0:04  |   \_ /home/user/llvm-build/bin/clang -cc1 -internal-isystem /home/user/llvm-build/lib/clang/18/include -nostdsysteminc /home/user/llvm-src/clang/test/CodeGen/X86/x86_32-xsave.c -DTEST_XSAVE -O0 
  16657 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck /home/user/llvm-src/clang/test/CodeGen/X86/x86_32-xsave.c --check-prefix=XSAVE
  16624 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/GlobalISel/Output/fdiv.f16.ll.script
  16627 pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -global-isel -march=amdgcn -mcpu=tahiti -denormal-fp-math=ieee -verify-machineinstrs
  16629 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck -check-prefixes=GFX6,GFX6-IEEE /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f16.ll
  16625 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/tools/clang/test/Headers/Output/opencl-c-header.cl.script
  16648 pts/2    R      0:05  |   \_ /home/user/llvm-build/bin/clang -cc1 -internal-isystem /home/user/llvm-build/lib/clang/18/include -nostdsysteminc -O0 -triple spir-unknown-unknown -internal-isystem ../../lib/Headers -include opencl-c.h -e
  16649 pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck /home/user/llvm-src/clang/test/Headers/opencl-c-header.cl
  16636 pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/mad-mix.ll.script
  16650 pts/2    R      0:05      \_ /home/user/llvm-build/bin/llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs
  16651 pts/2    S      0:00      \_ /home/user/llvm-build/bin/FileCheck -check-prefixes=GFX900,SDAG-GFX900 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/mad-mix.ll
  …

Granted, given enough time, those processes will eventually finish silently. But given that tests sometimes use deterministic intermediate files, if we run them again immediately we risk having spurious failures caused by two processes writing to the same file (i.e. kind of a a filesystem data race).

Running inside systemd-run

One of the downsides of running something as a service using systemd-run is that it won’t inherit the environment but instead will use the environment of the systemd session. Luckily this can be addressed using the -p EnvironmentFile=<file> option.

With all this, we can build a convenient shell script.

confine.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/usr/bin/env bash
set -euo pipefail

function cleanup() {
  [ -n "${ENV_FILE}" ] && rm -f "${ENV_FILE}"
}

ENV_FILE="$(mktemp)"
trap cleanup EXIT

env > "${ENV_FILE}"

systemd-run --user --pty --same-dir --wait --collect --service-type=exec -q \
            -p "EnvironmentFile=${ENV_FILE}" -- "$@"

The flag -q silences the informational messages emitted systemd-run on start and end.

Now we can run the regression tests using this convenient script, and even if we abort the execution by pressing Ctrl-C, systemd will kill all the process tree.

user:~/llvm-build$ confine.sh cmake --build . --target check
[2/3] cd /home/user/llvm-src/clang/bindings/python && /usr/bin/cmake -E env CLANG_NO_DEFAULT_CONFIG=1 CLANG_LIBRARY_PATH=/home/user/llvm-build/lib /usr/bin/python3 -m unittest discover
.................................................................................................................................
----------------------------------------------------------------------
Ran 129 tests in 1.410s

OK
[2/3] Running all regression tests
llvm-lit: /home/user/llvm-src/llvm/utils/lit/lit/llvm/config.py:488: note: using clang: /home/user/llvm-build/bin/clang
^C  interrupted by user, skipping remaining tests

Testing Time: 18.81s

Total Discovered Tests: 74509
  Skipped: 74509 (100.00%)
ninja: build stopped: interrupted by user.
user:~/llvm-build$ ps -x -f | grep "bash.*\.script" | wc -l
0

Hope this is useful :)