Think In Geek

Subtleties with loops

2024-02-11T09:50:00+00:00

A common task in imperative programming languages is writing a loop. A loop that can terminate requires a way to check the terminating condition and a way to repeatedly execute some part of the code. These two mechanisms exists in many forms: from the crudest approach of using an if and a goto (that must jump backwards in the code) to higher-level structured constructs like for and while ending in very high-level constructs built around higher-order functions in for_each-like constructs and more recently, in the context of GPU programming, the idea of a kernel function instantiated over a n-dimensional domain (where typically n ≤ 3 but most of the time n = 1).

These more advanced mechanisms make writing loops a commonplace task and typically regarded as uneventful. Yet, there are situations when things get subtler than we would like.

A ranged-loop over integers

Let’s consider a construct like this in some sort of pseudo-Pascal:

for i := lower to upper do
  S(i)

in which the statement S(i) is repeatedly executed with the value of the variable i starting with a value lower. Between each repetition we increase i by one. We stop repeating S(i) when i has the value upper. This is, S(upper) is executed but S(upper+1) is not.

As an example:

for i := 1 to 5 do
  writeln(i);

will print

A possible implementation

Let’s imagine how this could be compiled to a lower level representation. Imagine we only have goto and if + goto (as a way to mimick a bit how current computers work).

Back to our loop:

for i := lower to upper do
  S(i)

could be implemented like

i := lower;
loop:
  if i <= upper then goto repeated;
  goto after_loop;
repeated:
  S(i);
  i := i + 1;
  goto loop;
after_loop:
  { ... }

Iterating a whole range of integers

Now consider that, for some reason, we want to iterate over all the integers of, say, 32-bit. For simplicity, we will assume unsigned integers but signed integers face similar issues.

for i := 0 to 4294967295 do
  S(i)

It still seems not to be a big deal. But look at i, what type should it have?

If we use the implementation above, consider the last iteration. This is, when, i = 4294967295. The i variable has to be able to represent 4294967295 so it has to be at least 32-bit. If it is exactly 32-bit it will overflow when we compute i := i + 1;.

Here each system may behave differently: some system will simply wrap-around and i will become 0. Which is bad because 0 ≤ 4294967295 which is the condition we use to check whether we have to keep repeating so we will never terminate. Some other machine may trap, which is slightly better (we do terminate!) but prevents our correct program from running.

Now if you’re on a 64-bit system (or a system where the CPU provides efficient 64-bit integer arithmetic), this is easy to address, just make i to be 64-bit and you’re done.

But this is a bit of an unsatisfying answer and further questions may arise at this point.

What if we want to iterate all the 64-bit? Granted, this is a very large number of iterations and so we’re probably never going to terminate in a reasonable amount of time.

What if our CPU does not provide 32-bit integers and representing 64-bit magnitudes is expensive? The reality is that nowadays additions (and subtractions) are cheap for a CPU. For instance, on most 32-bit systems, adding or subtracting a 64-bit integer can be done with two instructions (rather than one if 64-bit were natively supported).

What if we chose to use a 64-bit integer (no matter if supported or not) but our loop has an unknown upper bound. If N is less than 4294967295 it would be fine to use a 32-bit integer.

for i := 0 to N do
  S(i)

This leaves us with a bit of an uneasy feeling and while modern machines could use a larger integer, we probably want a solution that always works.

A safer, but less nice, implementation

Can we implement the loop in a way so this issue is a non-problem? The answer is yes, but the loop will not look as nice.

if lower > upper then goto after_loop;
i := lower;
repeated:
  S(i);
  if i = upper then goto after_loop;
  i := i + 1;
  goto repeated;
after_loop:
  { ... }

Let’s be honest, this construction does not look very nice but it avoids any overflow. So i only has to be as large as lower and upper. In other words, there is no need to make it larger “just in case”.

Impact on optimisation

Compilers these days are very smart and the two loops can be compiled efficiently (they will emit almost the same code for both), so the less safe version has no particular performance advantage over the safer one.

From a teaching perspective, though, the less safe version is probably easier to explain.

What about C and C++?

But then, if we may overflow, what about a loop like this?

// Assume N is int
for (int i = 0; i <= N; i++)
  S(i);

According to the spec, the loop above is equivalent to the following code:

{
  int i = 0;
  while (i <= N) {
    S(i);
    i++;
  }
}

The C++ standard also tells us that signed integer overflow is undefined behaviour (UB) in C and C++.

Our loop is incorrect when N is 2147483647 (2147483647 is INT_MAX, assuming int is a 32-bit integer, which typically is) because it triggers UB in i++.

When a program triggers UB all bets are off in terms of its mandated behaviour. The observed behaviour becomes typically platform and/or compiler dependent. For example, in clang on x86-64 a loop like the above will loop forever at -O0 but it seems to work at -O1 or higher optimisation levels, in GCC on x86-64 it is likely to not to terminate at any optimisation level.

In contrast, a loop like this

// Assume N is unsigned
for (unsigned i = 0; i <= N; i++)
  S(i);

will never terminate when N = 4294967295. In C and C++, overflow of unsigned integers is well-defined as wrapping-around.

Based on the approach seen above, a way to correctly implement either case is as follows:

// Example for the signed case.
for (int i = 0; ; i++) {
  S(i);
  if (i == N) break;
}

Again, it does not look great but it is always correct.

Mitigate runaway processes

2024-01-05T10:34:00+00:00

Sometimes I find myself running testsuites that typically, in order to make the most of the several cores available in the system, spawn many processes so the tests can run in parallel. This allows running the testsuites much faster.

One side-effect, though, of these mechanisms is that they may not be able to handle correctly cancellation, say pressing Ctrl-C.

Today we are going to see a way to mitigate this problem using systemd-run.

Systemd

Systemd is the system and service manager used in Linux these days in replacement of existing solutions based on shell scripts. In contrast to loosely coupled scripts, systemd is a more integrated solution. In that sense it has pros and cons but the former seem to outweigh the latter and most Linux distributions have migrated to use systemd.

Systemd uses the concept of units, of which there are different kinds, and we are interested in the service unit type.

Typically units are described by files on the disk so we can start, stop, etc. using the systemctl command.

systemd-run

The tool systemd-run allows us to create service units on the fly for ad-hoc purposes. By default systemd-run will try to use the global (system-wide) systemd session, but we can tell it to use the systemd session created when the user logged on (e.g. via ssh) using the command option --user.

One interesting flag is the --shell flag, which allows us to run $SHELL as a systemd service. This means that systemd is in control of the processes created in there.

$ systemd-run --user --shell
Running as unit: run-u100.service
Press ^] three times within 1s to disconnect TTY.
$ uname -a
Linux mybox 6.1.0-17-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux
$ exit
exit
Finished with result: success
Main processes terminated with: code=exited/status=0
Service runtime: 2.715s
CPU time consumed: 10ms

The flag --shell according the documentation is a shortcut for the command options --pty --same-dir --wait --collect --service-type=exec $SHELL.

Use case

As part of my dayjob I often run the LLVM unit and regression tests. Once we have built LLVM, along with other projects such as clang, flang and lld, there is a target in the build system called check. Check will build the necessary infrastructure for unit tests and invoke lit

# Build LLVM and all the projects
user:~/llvm-build$ cmake --build .
# Run the unit and regression tests
user:~/llvm-build$ cmake --build . --target check

lit is implemented in Python and in order to exploit parallelism uses the multiprocessing module. Unfortunately if for some reason you need to cancel early the testsuite execution (e.g., you realised you forgot to add a test), say, pressing Ctrl-C, if your machine has lots of threads, you will end with a large number of runaway processes. This is easy to observe when LLVM is build in Debug mode as everything runs much slower, including tests. I have not dug further but I assume this is a limitation of the multiprocessing module.

Following is an example of what typically happens if we press Ctrl-C on a machine with 16 cores (32 threads):

user:~/llvm-build$ cmake --build . --target check
[2/3] cd /home/user/soft/llvm-build... /usr/bin/python3 -m unittest discover
.................................................................................................................................
----------------------------------------------------------------------
Ran 129 tests in 1.403s

OK
[2/3] Running all regression tests
llvm-lit: /home/user/llvm-src/llvm/utils/lit/lit/llvm/config.py:488: note: using clang: /home/user/llvm-build/bin/clang
^C  interrupted by user, skipping remaining tests

Testing Time: 4.53s

Total Discovered Tests: 74509
  Skipped: 74509 (100.00%)
ninja: build stopped: interrupted by user.

If right after cancelling we check ps -x -f, we will see a large number of processes that have been detached from the lit process.

user:~/llvm-build$ ps -x -f
  …
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-global-agent.ll.script
pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx600 -verify-machineinstrs
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX6 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-local-singlethread.ll.script
pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx600 -verify-machineinstrs
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX6 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-singlethread.ll
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/sched-group-barrier-pipeline-solver.mir.script
pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -march=amdgcn -mcpu=gfx908 -amdgpu-igrouplp-exact-solver -run-pass=machine-scheduler -o - /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck -check-prefix=EXACT /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-global-system.ll.script
pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx600 -verify-machineinstrs
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX6 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-system.ll
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-flat-agent.ll.script
pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX7 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-flat-singlethread.ll.script
pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX7 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-singlethread.ll
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-flat-system.ll.script
pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX7 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-system.ll
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-flat-wavefront.ll.script
pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX7 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-wavefront.ll
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/tools/clang/test/CodeGen/X86/Output/x86_64-xsave.c.script
pts/2    R      0:04  |   \_ /home/user/llvm-build/bin/clang -cc1 -internal-isystem /home/user/llvm-build/lib/clang/18/include -nostdsysteminc /home/user/llvm-src/clang/test/CodeGen/X86/x86_64-xsave.c -DTEST_XSAVE -O0 
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck /home/user/llvm-src/clang/test/CodeGen/X86/x86_64-xsave.c --check-prefix=XSAVE
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/memory-legalizer-flat-workgroup.ll.script
pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck --check-prefixes=GFX7 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/tools/clang/test/CodeGen/X86/Output/rot-intrinsics.c.script
pts/2    R      0:05  |   \_ /home/user/llvm-build/bin/clang -cc1 -internal-isystem /home/user/llvm-build/lib/clang/18/include -nostdsysteminc -x c -ffreestanding -triple x86_64--linux -no-enable-noundef-analysis -emit-llvm /home/roge
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck /home/user/llvm-src/clang/test/CodeGen/X86/rot-intrinsics.c --check-prefixes CHECK,CHECK-64BIT-LONG
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/tools/clang/test/Headers/Output/opencl-builtins.cl.script
pts/2    R      0:09  |   \_ /home/user/llvm-build/bin/clang -cc1 -internal-isystem /home/user/llvm-build/lib/clang/18/include -nostdsysteminc -include /home/user/llvm-src/clang/test/Headers/opencl-builtins.cl /home/ro
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/tools/clang/test/CodeGen/PowerPC/Output/ppc-smmintrin.c.script
pts/2    R      0:04  |   \_ /home/user/llvm-build/bin/clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS /home/user/llvm-src/clang/test/CodeGen/PowerPC/ppc-smmintrin.c -fno-discard-
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/tools/clang/test/CodeGen/X86/Output/x86_32-xsave.c.script
pts/2    R      0:04  |   \_ /home/user/llvm-build/bin/clang -cc1 -internal-isystem /home/user/llvm-build/lib/clang/18/include -nostdsysteminc /home/user/llvm-src/clang/test/CodeGen/X86/x86_32-xsave.c -DTEST_XSAVE -O0 
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck /home/user/llvm-src/clang/test/CodeGen/X86/x86_32-xsave.c --check-prefix=XSAVE
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/GlobalISel/Output/fdiv.f16.ll.script
pts/2    R      0:10  |   \_ /home/user/llvm-build/bin/llc -global-isel -march=amdgcn -mcpu=tahiti -denormal-fp-math=ieee -verify-machineinstrs
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck -check-prefixes=GFX6,GFX6-IEEE /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f16.ll
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/tools/clang/test/Headers/Output/opencl-c-header.cl.script
pts/2    R      0:05  |   \_ /home/user/llvm-build/bin/clang -cc1 -internal-isystem /home/user/llvm-build/lib/clang/18/include -nostdsysteminc -O0 -triple spir-unknown-unknown -internal-isystem ../../lib/Headers -include opencl-c.h -e
pts/2    S      0:00  |   \_ /home/user/llvm-build/bin/FileCheck /home/user/llvm-src/clang/test/Headers/opencl-c-header.cl
pts/2    S      0:00  \_ /bin/bash /home/user/llvm-build/test/CodeGen/AMDGPU/Output/mad-mix.ll.script
pts/2    R      0:05      \_ /home/user/llvm-build/bin/llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs
pts/2    S      0:00      \_ /home/user/llvm-build/bin/FileCheck -check-prefixes=GFX900,SDAG-GFX900 /home/user/llvm-src/llvm/test/CodeGen/AMDGPU/mad-mix.ll
  …

Granted, given enough time, those processes will eventually finish silently. But given that tests sometimes use deterministic intermediate files, if we run them again immediately we risk having spurious failures caused by two processes writing to the same file (i.e. kind of a a filesystem data race).

Running inside systemd-run

One of the downsides of running something as a service using systemd-run is that it won’t inherit the environment but instead will use the environment of the systemd session. Luckily this can be addressed using the -p EnvironmentFile=<file> option.

With all this, we can build a convenient shell script.

confine.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/usr/bin/env bash
set -euo pipefail

function cleanup() {
  [ -n "${ENV_FILE}" ] && rm -f "${ENV_FILE}"
}

ENV_FILE="$(mktemp)"
trap cleanup EXIT

env > "${ENV_FILE}"

systemd-run --user --pty --same-dir --wait --collect --service-type=exec -q \
            -p "EnvironmentFile=${ENV_FILE}" -- "$@"

The flag -q silences the informational messages emitted systemd-run on start and end.

Now we can run the regression tests using this convenient script, and even if we abort the execution by pressing Ctrl-C, systemd will kill all the process tree.

user:~/llvm-build$ confine.sh cmake --build . --target check
[2/3] cd /home/user/llvm-src/clang/bindings/python && /usr/bin/cmake -E env CLANG_NO_DEFAULT_CONFIG=1 CLANG_LIBRARY_PATH=/home/user/llvm-build/lib /usr/bin/python3 -m unittest discover
.................................................................................................................................
----------------------------------------------------------------------
Ran 129 tests in 1.410s

OK
[2/3] Running all regression tests
llvm-lit: /home/user/llvm-src/llvm/utils/lit/lit/llvm/config.py:488: note: using clang: /home/user/llvm-build/bin/clang
^C  interrupted by user, skipping remaining tests

Testing Time: 18.81s

Total Discovered Tests: 74509
  Skipped: 74509 (100.00%)
ninja: build stopped: interrupted by user.
user:~/llvm-build$ ps -x -f | grep "bash.*\.script" | wc -l
0

Hope this is useful :)

Locally testing API Gateway Docker based Lambdas

2023-12-24T00:00:00+00:00

AWS Lambda is one of those technologies that makes the distinction between infrastructure and application code quite blurry. There are many frameworks out there, some of them quite popular, such as AWS Amplify and the Serverless Framework, which will allow you to define your Lambda, your application code, and will provide tools that will package and provision, and then deploy those Lambdas (using CloudFormation under the hood). They also provide tools to locally run the functions for local testing, which is particularly useful if they are invoked using technologies such as API Gateway. Sometimes, however, especially if your organisation has adopted other Infrastructure as Code tools such as Terraform, you might want to just provision a function with simpler IaC tools, and keep the application deployment steps separate. Let us explore an alternative method to still be able to run and test API Gateway based Lambdas locally without the need to bring in big frameworks such as the ones mentioned earlier.

We will make some assumptions before moving forward:

Our Lambda will be designed to be invoked by AWS API Gateway, using the Proxy Integration.
Our Lambda will be Docker based.
Our Lambda has already been provisioned by another tool, so our only concern here is how to locally build it and run it the same way any other client would do via API Gateway.

Lambda code and Docker image

Let us follow the AWS Documentation and write a very simple function in Python which we can use throughout this project.

The Python code for our handler will be straightforward:

lambda_function.py

import json

def handler(event, context):
    return {
        "isBase64Encoded": False,
        "statusCode": 200,
        "body": json.dumps(event),
        "headers": {"content-type": "application/json"},
    }

This handler will simply return a 200 response code with the Lambda event as its body, in JSON format.

In order to package this function so that the AWS runtime can execute it, we will make use of the provided AWS base Docker image, and add our code to it (at the time of writing this article Python’s latest version was 3.12). The dockerfile below assumes that our code is written on a file named lambda_function.py and that we have a requirements.txt file with our dependencies on it (in our case the file can be empty).

dockerfile

FROM public.ecr.aws/lambda/python:3.12

# Copy requirements.txt
COPY requirements.txt ${LAMBDA_TASK_ROOT}

# Install the specified packages
RUN pip install -r requirements.txt

# Copy function code
COPY lambda_function.py ${LAMBDA_TASK_ROOT}

# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD [ "lambda_function.handler" ]

Running and testing the Lambda function

In order to test that this all works as expected, we need to build that Docker image and run it:

docker build -t docker-image:test .
docker run -p 9000:8080 docker-image:test

The above commands will do exactly that, and map the container port 8080 to the local port 9000.

As per the documentation, in order to test this function and see an HTTP response, it is not sufficient to just make an HTTP request to http://localhost:9000. If we were to do this, we would simply get back a 404 response. After all, our function could be triggered in the real world not just by HTTP requests but by many other events, such as a change to an S3 bucket, or a message being pulled from an SQS queue.

Behind the scenes, any invocation of a Lambda function eventually happens via an API call. When we make an HTTP request that is eventually served by a Lambda function, what is happening is that some other service (for example AWS API Gateway, or an AWS ALB) transforms that HTTP request into an event, then that event is passed to the Lambda Invoke method as a parameter, and the Lambda response gets mapped back to an HTTP response.

The AWS provided base Docker images already come with something called the Runtime Interface Client which takes care of acting as that proxy for you, allowing the invocation of the function via an HTTP API call.

In order to get our local Lambda to reply with a response, this is what we need to do instead:

curl "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'

This will invoke the Lambda with an empty event. If our Lambda is to be behind AWS API Gateway using a Proxy Integration, the real event it would receive would look like this:

{
  "request_uri": "/",
  "request_headers": {
    "user-agent": "curl/8.1.2",
    "content-type": "application/json",
    "accept": "*/*",
    "host": "localhost:8000"
  },
  "request_method": "GET",
  "request_uri_args": {}
}

In some cases testing our Lambda locally by carefully crafting curl commands with JSON payloads might be a good option, but sometimes it is necessary to be able to locally hit our Lambda just like we would do if we had the AWS API Gateway Proxy Integration in place. A good example of this might be if we want to test locally how our Lambda would interact with other services we are also running locally, such as a web browser making a GET HTTP request. This is where big footprint frameworks come in handy, since they have those tools built in.

Kong API Gateway to the rescue

An alternative way to gain the same behaviour we would get with frameworks such as Amplify or the Serverless Framework when it comes to testing Lambdas locally is to make use of an open source API Gateway tool called Kong. Kong is a big API Gateway product and offers many features, but in a nutshell what it does is take an incoming HTTP Request, optionally transform it, send it to a downstream service, optionally transform the response, and send that back to the client. One of the many downstream services Kong supports out of the box through a plugin are AWS Lambda functions. One could argue that using something like Kong just to test our Lambda is no different than going the Framework route, however, there are a couple of things I find particularly relevant here:

Kong can be run via Docker, which we already need to package and run our Lambda. This means we do not have to install any new tool in our local setup.
This solution allows us to keep our Lambda setup small and simple, and we are not forced to follow any Framework ways of organising our source code.

So our final setup is going to look like this:

The HTTP request will be sent to Kong, then Kong will transform that request into a Lambda API call, the Lambda will receive that call with an HTTP event, and will respond with a JSON payload, which Kong will transform again and send back to the HTTP client.

In order for this to work, we need to configure Kong to proxy HTTP requests to our Lambda. We can do this by using a declarative configuration that uses the aws-lambda plugin on the / route.

We can achieve this using this kong.yml configuration file:

kong.yml

_format_version: "3.0"
_transform: true

routes:
- name: lambda
  paths: [ "/" ]

plugins:
- route: lambda
  name: aws-lambda
  config:
    aws_region: eu-west-1
    aws_key: DUMMY_KEY
    aws_secret: DUMMY_SECRET
    function_name: function
    host: lambda
    port: 8080
    disable_https: true
    forward_request_body: true
    forward_request_headers: true
    forward_request_method: true
    forward_request_uri: true
    is_proxy_integration: true

A few things worth mentioning:

The aws_key and aws_secret are mandatory for the plugin to work, however we do not need to put any real secrets in there, since the invocation will happen locally.
function_name should stay hardcoded as function, as this is the name the Runtime Interface Client uses by default.
The host and port values there should point to your local docker container running the Lambda function. In our case we use lambda and 8080 as we will run all this solution in a single Docker Compose setup where the Lambda runs in a container named lambda.
We need to set disable_https to true as our Lambda container is not able to handle SSL.
The rest of the configuration options can be tweaked depending on our specific needs. They are all documented in the Kong website. The values shown here will work for an AWS Lambda Proxy Integration setup using AWS API Gateway, but the Kong plugin supports other types of integrations.

Putting it all together

So far we have built a Docker based Lambda function and we are able to run it locally. We have also seen how to configure Kong API Gateway to proxy HTTP requests to that function. We will now look at what a Docker Compose setup might look like to run it all in a single project and command.

The full source code for this can be found in brafales/docker-lambda-kong. I recommend checking it out to see the whole project structure.

We will assume we have the following folders in our root:

lambda: here we will store the Lambda function source code and its Dockerfile.
kong: here we will store the declarative configuration for Kong which will allow us to set it up as a proxy for our function.

And then in the root we can have our docker-compose.yml file:

docker-compose.yml

services:
  lambda:
    build:
      context: lambda
    container_name: lambda
    networks:
      - lambda-example
  kong:
    image: kong:latest
    container_name: kong
    ports:
      - "8000:8000"
    environment:
      KONG_DATABASE: off
      KONG_DECLARATIVE_CONFIG: /usr/local/kong/declarative/kong.yml
    volumes:
      - ./kong:/usr/local/kong/declarative
    networks:
      - lambda-example

networks:
  lambda-example:

This file does the following:

Creates a docker network called lambda-example. This is optional since the default network created by compose would work equally well.
It defines a Docker container named lambda and instructs compose to build it using the contents of the lambda folder.
It defines a Docker container named kong, using the Docker image kong:latest, and mapping our kong folder to the container path /usr/local/kong/declarative. This will allow the container to read our declarative config file, which we set as an environment variable KONG_DECLARATIVE_CONFIG. We also set KONG_DATABASE to off to instruct Kong not to search for a database to read its config from, and finally map the container port 8000 to our localhost port 8000.

With all this in place, we can now simply run the following command to spin it all up:

docker compose up

Once all is up and running, we can now reach our Lambda function using curl or any other HTTP client like we would normally do if it was deployed to AWS behind an API Gateway:

➜ curl -s localhost:8000 | jq .
{
  "request_method": "GET",
  "request_body": "",
  "request_body_args": {},
  "request_uri": "/",
  "request_headers": {
    "user-agent": "curl/8.1.2",
    "host": "localhost:8000",
    "accept": "*/*"
  },
  "request_body_base64": true,
  "request_uri_args": {}
}

➜ curl -s -X POST localhost:8000/ | jq .
{
  "request_method": "POST",
  "request_body": "",
  "request_body_args": {},
  "request_uri": "/",
  "request_headers": {
    "user-agent": "curl/8.1.2",
    "host": "localhost:8000",
    "accept": "*/*"
  },
  "request_body_base64": true,
  "request_uri_args": {}
}

➜ curl -s  localhost:8000/?foo=bar | jq .
{
  "request_method": "GET",
  "request_body": "",
  "request_body_args": {},
  "request_uri": "/?foo=bar",
  "request_headers": {
    "user-agent": "curl/8.1.2",
    "host": "localhost:8000",
    "accept": "*/*"
  },
  "request_body_base64": true,
  "request_uri_args": {
    "foo": "bar"
  }
}

Graphical notifications for long-running tasks

2023-09-03T21:15:00+00:00

In my dayjob I often have to perform long-running tasks that do not require constant attention (e.g. compiling a compiler) on Linux systems. When this happens, it is unavoidable to context switch to other tasks even if experts advice against it. Turns out that compilation scrolls are not always very interesting.

I would like to be able to resume working on the original task as soon as possible. So the idea is to receive a notification when the task ends.

Local notifications

If the time-consuming task is being run locally and we are using a graphical environment we can use the tool notify-send to send ourselves a notification when the command ends. We can combine this in a convenient script like the one below.

runot

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/usr/bin/env bash

$*
result="$?"

if [ "$result" != "0" ];
then
  icon="dialog-warning"
else
  icon="dialog-information"
fi
notify-send "--icon=$icon" "$*"

exit "$result"

We execute the command and the we use notify-send with the executed command an appropriate icon based on the execution result.

$ runot very slow thing
< "very slow thing" runs >
< a notification appears >

How does this work?

Without entering into too much detail, notify-send connects to D-Bus and sends a notification, as specified in the Desktop Notifications Specification. A daemon configured by your desktop environment is waiting for the notifications. Upon receiving one it graphically displays the notification.

Remote notifications

D-Bus is really cool technology that allows different applications to interoperate and is specially useful in a desktop environment. That said, the typical use case of D-Bus is typically scoped by user sessions on the same computer and, while not impossible, the message bus is not meant to span over several computers.

This means that if rather than working locally, we work over SSH on a remote-machine we will not be able to send notifications to our local-machine desktop straightforwardly. There are two options here that we can use. Neither is perfect but will allow us to deliver notifications to our desktop computer from a remote system.

Forward the UNIX socket
Use a remote notification daemon

Forward the UNIX socket

D-Bus clients know where to find the message bus by reading the environment variable DBUS_SESSION_BUS_ADDRESS. In most systems nowadays it looks like this

$ echo $DBUS_SESSION_BUS_ADDRESS
unix:path=/run/user/9999/bus

This syntax means the D-Bus server, initiated by some other application upon login, can be found at the specified path. In this case the specified path is a UNIX socket, so in principle only accessible to processes in the current machine.

We can forward a UNIX socket using ssh, like we usually do with TCP ports.

(local-machine) $ ssh -R /some/well/known/path/dbus.socket:${DBUS_SESSION_BUS_ADDRESS/unix:path=/} user@remote-machine
(remote-machine) $ export DBUS_SESSION_BUS_ADDRESS=/some/well/known/path/dbus.socket
(remote-machine) $ notify-send "Hello world"
< notification appears in the local machine as if sent locally >

You can use any path for /some/well/known/path/dbus.socket, including a subdirectory of your home directory.

Pros

The notification is reported as if it had been sent by a local process, so it integrates very well with the environment.

From a usability point of view this is the strongest point of this approach.

Cons

This only works if local-machine and remote-machine share the same UID and GID. This can be easy to achieve in corporate environments where all systems use a unified login system based on LDAP or Active Directory.

For security reasons, the default configuration of D-Bus only allows processes of the same user to access the bus. The protocol checks that the uid and gid of the process connecting to the bus match the uid and gid of the process that started the D-Bus daemon. This avoids other local processes, not belonging to our user, to connect to our D-Bus daemon.

This may be an importation limitations in many systems (e.g. my laptop at work is not integrated in the LDAP of other systems or, for security reasons, we have different credentials in development vs production systems).

You need to remove the UNIX socket on the remote machine every time you start a session, but not in subsequent ssh connections.

This can be mitigated by using a distinguished script to connect to the remote machine as a way to initiate the “session”. You would run this only for the first connection, the other ones would just use a regular ssh command.

ssh-session

1
2
3
4
5
#!/usr/bin/env bash

remote="$1"
ssh "$remote" "rm -f /some/well/known/path/dbus.socket"
exec ssh -R "/some/well/known/path/dbus.socket:${DBUS_SESSION_BUS_ADDRESS/unix:path=/}" "$remote"

(local-machine) $ ssh-session user@remote-machine

This script is a bit simplistic and assumes you can remotely execute commands without having to enter a password (e.g. because you are using a SSH key). I have not tried it, but perhaps using ProxyCommand this initial script can be made more convenient without requiring entering the password twice.

Alternatively, if we can configure the SSH server on remote-machine, we can add the option StreamLocalBindUnlink yes to /etc/ssh/sshd_config. This will remove (unlink) the /some/well/known/path/dbus.socket upon exiting so we don’t have to remove it beforehand.

Note that once you close the ssh connection that forwarded the UNIX socket, notifications will stop working. So you probably want to close that one the last in case you’re working with several ssh session to remote-machine at the same time.

You need to set the DBUS_SESSION_BUS_ADDRESS environment variable first.

This can be addressed as described in this post by Nikhil. We can add the following to our .bashrc file.

.bashrc

…
# If the shell is running over SSH, override the session DBus socket to point
# to the one forwarded over SSH.
if  [ -n $SSH_CONNECTION ]; then
  export DBUS_SESSION_BUS_ADDRESS=/some/well/known/path/dbus.socket
fi
…

Use a remote notification daemon

This approach is a bit more involved but basically relies on forwarding X11, running a notification daemon on remote-machine that we will activate using D-Bus itself. The notification daemon will then display the notifications using X11 which will be displayed on our local-machine as any other X11 forward application does.

Note: this approach assumes the user is not running a graphical session on remote-machine. There are chances that this procedure may confuse the graphical environment when sending notifications.

Pros

Does not need uid/gid synchronisation between local-machine and remote-machine.

This was the main limitation with the earlier approach.

Cons

Needs X11 forwarding which may not always be available

We need to pass -X when connecting to remote-machine.

(local-machine) $ ssh -X remote-machine

Alternatively we can add a configuration entry to the ~/.ssh/config of local-machine.

~/.ssh/config

…
Host remote-machine
  HostName remote-machine.example.com
  ForwardX11 "yes"
…

Relies on systemd and D-Bus

These two components are present in most distributions these days, so they can be assumed.

We also assume that a D-Bus session is running when we connect to remote-machine (i.e. on remote-machine, the environment variable DBUS_SESSION_BUS_ADDRESS points to some UNIX socket of remote-machine). Again, most distributions these days provide this functionality out of the box. Setting this up is out of scope of this post.

The result is less integrated as we use a notification daemon different to the one in the graphical environment of local-machine.

There is a number of different notification daemons, some of which can be configured to suit ones taste. In this example we will use notification-daemon which is a reference implementation of the notification protocol and seems to work fine for our needs. The Arch wiki has a a list of notification daemons. Recall that the notification daemon runs on remote-machine.

Activation via D-Bus

This means that every time we invoke notify-send, if no notification daemon is running, one will be started for us. If one is running already, that one will be used by notify-send.

There are two files that we need to create on remote-machine to set up D-Bus activation.

First ~/.local/share/dbus-1/services/org.Notifications.service to tell D-Bus what is the associated systemd unit and daemon.

~/.local/share/dbus-1/services/org.Notifications.service

1
2
3
4
[D-BUS Service]
Name=org.freedesktop.Notifications
Exec=/usr/lib/notification-daemon/notification-daemon
SystemdService=my-notification-daemon.service

Change the path of Exec to the proper location of the notification-daemon executable: the one shown corresponds to Ubuntu/Debian systems.

Now we need to create a systemd-unit in ~/.config/systemd/user/my-notification-daemon.service

~/.config/systemd/user/my-notification-daemon.service

1
2
3
4
5
6
7
[Unit]
Description=My notification daemon

[Service]
Type=dbus
BusName=org.freedesktop.Notifications
ExecStart=/usr/lib/notification-daemon/notification-daemon

The path of ExecStart must be the same as Exec above.

With all this, notify-send run on remote-machine will automatically initiate the notification-daemon if none is running.

However, this will not work yet because the notification-daemon is a X11 application and needs some environment information to proceed. We can do that by running the following command.

(remote-machine) $ dbus-update-activation-environment \
  --systemd DBUS_SESSION_BUS_ADDRESS DISPLAY XAUTHORITY

This command above can be added to the .bashrc of remote-machine so it runs automatically every time we connect. This must run before we activate the notification-daemon for the first time, otherwise the activation will fail.

With all this in place, it should now be possible to send a test notification.

(remote-machine) $ notify-send "Hello world"

We should see how a new popup appears to the top right of our screen (possibly with an additional icon to our notification area).

This approach is a bit more involved so you may have to troubleshoot a bit. The following command will show us the dbus activations.

(remote-machine) $ journalctl --user --follow -g notif

In my experience the most common error is forgetting to run dbus-update-activation-environment, so notification-daemon fails to start and exits immediately.

Hope this useful :)

Writing GObjects in C++

2023-02-04T21:46:00+00:00

In the last post I discussed about how glibmm, the wrapper of the GLib library exposes GObjects and we finished about a rationale about why one would want to write full-fledged GObjects in C++.

Today we are exploring this venue and observing some of the pain points we are going to face.

Quick recap

GLib is the foundational library on which other technologies like the GTK GUI toolkit or many components of the GNOME Desktop environment software stack build upon. GLib contains GObject, a dynamic type system that implements a more or less classical OOP paradigm. GLib is written in C and glibmm is the C++ wrapper of GLib.

GObject type system exposes classes and instances (objects) of classes as normal C data. Mostly for ergonomic reasons, glibmm focuses on the (GObject) instances and does not expose as much the (GObject) classes. This means that our C++ classes will be used to implement behaviour of (GObject) instances and not so much behaviour of (GObject) classes.

We need a full fledged GObject if we want it to interact with other components in the GTK/GNOME Desktop stack. In particular I’m interested in being able to use those C++-written GObjects in .ui files that describe interfaces.

Current approach

Let’s see a simplified version of the example in the gtkmm book on how to use using derived widgets and .ui files.

First lets define a very simple interface made up of an application window that includes a box container which has our derived button.

derived.ui

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<?xml version="1.0" encoding="UTF-8"?>
<interface>
  <object class="GtkApplicationWindow" id="WindowDerived">
    <property name="can_focus">False</property>
    <property name="title" translatable="yes">Derived Builder example</property>
    <property name="default_width">150</property>
    <property name="default_height">100</property>
    <property name="hide_on_close">True</property>
    <child>
      <object class="GtkBox" id="dialog-vbox2">
        <property name="orientation">vertical</property>
        <property name="valign">center</property>
        <child type="end">
          <object class="gtkmm__CustomObject_MyButton" id="quit_button">
            <property name="halign">center</property>
            <property name="label">Quit</property>
            <property name="button-ustring">Button with extra properties</property>
            <property name="button-int">85</property>
          </object>
        </child>
      </object>
    </child>
  </object>
</interface>

Line 14 of derived.ui refers to our custom button class. Because it inherits from a Gtk.Button it inherits its properties such as label or halign (which is actually inherited from Gtk.Widget). We will define our own custom properties button-ustring and button-int whose initial values are set to the values in the XML file ("Button with extra properties" and 85, respectively).

Custom button with extra properties

Let’s define now our custom button.

derivedbutton.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#ifndef DERIVED_BUTTON_H
#define DERIVED_BUTTON_H

#include <gtkmm.h>

class DerivedButton : public Gtk::Button {
public:
  DerivedButton();
  DerivedButton(BaseObjectType *cobject, const Glib::RefPtr<Gtk::Builder> &);
  virtual ~DerivedButton();

  Glib::PropertyProxy<Glib::ustring> property_ustring() {
    return prop_ustring.get_proxy();
  }
  Glib::PropertyProxy<int> property_int() { return prop_int.get_proxy(); }

private:
  Glib::Property<Glib::ustring> prop_ustring;
  Glib::Property<int> prop_int;

  void on_ustring_changed();
  void on_int_changed();
};

#endif

Here we define our two custom properties and we define proxies for them. Proxies will allow us to connect the signal that is emitted when the property changes.

Constructors at lines 8 and 9 deserve some explanation, but first let’s see the implementation of the class.

derivedbutton.cc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include "derivedbutton.h"
#include <iostream>

// For creating a dummy object in main.cc.
DerivedButton::DerivedButton()
    : Glib::ObjectBase("MyButton"), prop_ustring(*this, "button-ustring"),
      prop_int(*this, "button-int", 10) {}

void DerivedButton::on_ustring_changed() {
  std::cout << "- ustring property changed! new val " << property_ustring()
            << std::endl;
}

void DerivedButton::on_int_changed() {
  std::cout << "- int property changed! new val " << property_int()
            << std::endl;
}

DerivedButton::DerivedButton(BaseObjectType *cobject,
                             const Glib::RefPtr<Gtk::Builder> &)
    : Glib::ObjectBase("MyButton"), Gtk::Button(cobject),
      prop_ustring(*this, "button-ustring"), prop_int(*this, "button-int", 10) {
  property_ustring().signal_changed().connect(
      sigc::mem_fun(*this, &DerivedButton::on_ustring_changed));
  property_int().signal_changed().connect(
      sigc::mem_fun(*this, &DerivedButton::on_int_changed));
}

DerivedButton::~DerivedButton() {}

The constructor at line 5 is a dummy constructor that we will need later, when initialising the application (or widget library). We need it because GLib distinguishes the registering of a class type in the type system and the instantiation of objects of such type as two different steps. However, glibmm combines both, so we need to make sure the class type exists before we can use it generically from GLib or other libraries using GObject. The only way to do this in glibmm is to instantiate a C++ object of the C++ class wrapping the GObject class.

Unfortunately, this also means that any other constructor needs to behave the same when it comes to registering the class type. So the constructor at line 19 needs to initialise Glib::ObjectBase and the properties in the same way, to avoid unexpected inconsistencies. This constructor also has to propagate the C object (cobject) to the parent constructor. This object has been generically built using generic GObject machinery and so we are actually wrapping an object that already exists (i.e. the GObject instance does not exist because we instantiated the class DerivedButton which is another possible scenario).

Main window

Now let’s look at the main window. This is not a custom widget because we won’t be defining new properties for it. However in C++ we will create a subclass for it as well.

derivedwindow.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#ifndef DERIVED_WINDOW_H
#define DERIVED_WINDOW_H

#include "derivedbutton.h"
#include <gtkmm.h>

class DerivedWindow : public Gtk::ApplicationWindow {
public:
  DerivedWindow(BaseObjectType *cobject,
                const Glib::RefPtr<Gtk::Builder> &builder);
  virtual ~DerivedWindow();

protected:
  // Signal handlers:
  void on_button_quit();

  Glib::RefPtr<Gtk::Builder> m_builder;
  DerivedButton *m_pButton;
};

#endif

Line 9 contains a constructor that again, wraps a GObject instance that will be created elsewhere. Parameter builder is a reference to Gtk.Builder which is an object used to create interfaces from .ui files.

derivedwindow.cc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include "derivedwindow.h"
#include <iostream>

DerivedWindow::DerivedWindow(BaseObjectType *cobject,
                             const Glib::RefPtr<Gtk::Builder> &builder)
    : Gtk::ApplicationWindow(cobject), m_builder(builder),
      m_pButton(nullptr) {
  // Get the Gtk.Builder-instantiated Button, and connect a signal handler:
  m_pButton = Gtk::Builder::get_widget_derived<DerivedButton>(m_builder,
                                                              "quit_button");
  if (m_pButton) {
    m_pButton->signal_clicked().connect(
        sigc::mem_fun(*this, &DerivedWindow::on_button_quit));
  }
}

DerivedWindow::~DerivedWindow() {}

void DerivedWindow::on_button_quit() {
  // set_visible(false) will cause Gtk::Application::run() to end.
  set_visible(false);
}

The implementation is pretty straightforward, we wrap the created gobject and we keep a reference to the Gtk.Builder we receive. Then we use the builder instance to obtain our derived button. If all goes well we connect the clicked signal so it hides the dialog. We will use this later to quit the application.

Main application

The only last piece remaining is the entry point to our application.

main.cc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include "derivedwindow.h"
#include <cstring>
#include <iostream>

namespace {

DerivedWindow *pWindow = nullptr;
Glib::RefPtr<Gtk::Application> app;

void on_app_activate() {
  // Create a dummy instance before the call to refBuilder->add_from_file().
  // This creation registers DerivedButton's class in the GObject type system.
  // This is necessary because DerivedButton contains user-defined properties
  // (Glib::Property) and is created by Gtk::Builder.
  static_cast<void>(DerivedButton());

  // Load the GtkBuilder file and instantiate its widgets:
  auto refBuilder = Gtk::Builder::create();
  try {
    refBuilder->add_from_file("derived.ui");
  } catch (...) {
    std::cerr << "Error while loading .ui file\n";
    return;
  }

  // Get the GtkBuilder-instantiated dialog:
  pWindow = Gtk::Builder::get_widget_derived<DerivedWindow>(refBuilder,
      "WindowDerived");

  if (!pWindow) {
    std::cerr << "Could not get the dialog" << std::endl;
    return;
  }

  // It's not possible to delete widgets after app->run() has returned.
  // Delete the dialog with its child widgets before app->run() returns.
  pWindow->signal_hide().connect([]() { delete pWindow; });

  app->add_window(*pWindow);
  pWindow->set_visible(true);
}
} // anonymous namespace

int main(int argc, char **argv) {
  app = Gtk::Application::create("org.gtkmm.example");

  // Instantiate a dialog when the application has been activated.
  // This can only be done after the application has been registered.
  // It's possible to call app->register_application() explicitly, but
  // usually it's easier to let app->run() do it for you.
  app->signal_activate().connect([]() { on_app_activate(); });

  return app->run(argc, argv);
}

Our program will start its execution at line 45. We create a Gtk::Application with a proper app-id and then we connect the activate signal in line 51. Then we run the application in line 53.

The activation signal is connected to the function on_app_activate at line 10. One first thing it does is to ensure that our custom GObject class type is registered. This class will be called gtkmm__CustomObject_MyButton inside the GObject type system, and this is the name we used above in our XML file. As I mentioned above, because glibmm combines class registration and object instantiation in a single process, we need to create a dummy object (that will be immediately destroyed) before Gtk.Builder instantiates an object of class gtkmm__CustomObject_MyButton. If you remove line 15, line 20 will fail because it will not be able to instantiate our custom GObject class.

The rest is more or less straightforward: we get the window instance from the .ui file and we connect the hide signal so we destroy the window upon returning. Recall that in the constructor of DerivedWindow we made our button to hide the window, so it quits the application. We finally make the window visible.

Discussion

This is the suggested approach in glibmm. I think its bigger advantage is that it does not require a lot of additional machinery. However, due to the way glibmm works internally, we need to remember to create a fake instance that registers our class type in GObject. This requires a dummy default constructor (which might be a problem when extending a class that does not have one) in addition to the usual wrapping constructor used by Gtk::Builder. All the constructors we want to have will have to be synchronised (though C++ can mitigate this thanks to forwarding constructors and non-static data member initialisers).

Let’s see if we can do something a bit more predictable. While the approach used by glibmm is reasonable, registering a class type as a side effect of creating an instance for me breaks the principle of least surprise. In fact, the ability of glibmm to hide the concept of the GObject class is so successful that unless one starts reading glibmm’s code, it may be difficult to understand how all the pieces fit. Leaving a user of the library with that “magic” feeling that suddenly turns to unease when we cannot really explain how it all works.

Manual approach

Let’s follow a more manual approach, inspired by what gmmproc does. gmmproc is the wrapping machinery that can be used to wrap GObject-based libraries. I will do this with the DerivedButton class (though a similar approach can be used with DerivedWindow if wanted).

One big downside of this approach is that we need some amount of boilerplate (which gmmproc does for this when wrapping existing GObject-based libraries).

Custom class helper

We will have to define the GObject class class and the GObject instance class. To define the class we will use a custom class that we will use to sidestep some of the glibmm defaults.

customclass.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#ifndef GLIBMM_CUSTOMCLASS_H
#define GLIBMM_CUSTOMCLASS_H

#include <glibmm/class.h>

namespace Glib {
class CustomClass : public Class {
public:
  // Inherit constructors;
  using Class::Class;

  // Reintroduce existing overloads.
  using Class::register_derived_type;
  // Our new overload.
  void register_derived_type(GType base_type,
                             GInstanceInitFunc instance_init = nullptr,
                             const char *type_name = nullptr,
                             GTypeModule *module = nullptr);
};

} // namespace Glib

#endif // GLIBMM_CUSTOMCLASS_H

The implementation class is a bit longer but basically repeats what Glib::Class does but allowing us to specify a name.

customclass.cc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#include "customclass.h"

namespace Glib {

void CustomClass::register_derived_type(GType base_type,
                                        GInstanceInitFunc instance_init,
                                        const char *type_name,
                                        GTypeModule *module) {
  if (gtype_)
    return; // already initialized

  // 0 is not a valid GType.
  // It would lead to a crash later.
  // We allow this, failing silently, to make life easier for gstreamermm.
  if (base_type == 0)
    return; // already initialized

#if GLIB_CHECK_VERSION(2, 70, 0)
  // Don't derive a type if the base type is a final type.
  if (G_TYPE_IS_FINAL(base_type)) {
    gtype_ = base_type;
    return;
  }
#endif

  GTypeQuery base_query = {
      0,
      nullptr,
      0,
      0,
  };
  g_type_query(base_type, &base_query);

  // GTypeQuery::class_size is guint but GTypeInfo::class_size is guint16.
  const guint16 class_size = (guint16)base_query.class_size;

  // GTypeQuery::instance_size is guint but GTypeInfo::instance_size is
  // guint16.
  const guint16 instance_size = (guint16)base_query.instance_size;

  const GTypeInfo derived_info = {
      class_size,
      nullptr,          // base_init
      nullptr,          // base_finalize
      class_init_func_, // Set by the caller ( *_Class::init() ).
      nullptr,          // class_finalize
      nullptr,          // class_data
      instance_size,
      0, // n_preallocs
      instance_init,
      nullptr, // value_table
  };

  if (!(base_query.type_name)) {
    g_critical("Class::register_derived_type(): base_query.type_name is NULL.");
    return;
  }

  gchar *derived_name =
      (type_name && *type_name != '\0')
          ? g_strdup(type_name)
          : g_strconcat("gtkmm__", base_query.type_name, nullptr);

  if (module)
    gtype_ = g_type_module_register_type(module, base_type, derived_name,
                                         &derived_info, GTypeFlags(0));
  else
    gtype_ = g_type_register_static(base_type, derived_name, &derived_info,
                                    GTypeFlags(0));

  g_free(derived_name);
}

} // namespace Glib

Header

With this first piece of boilerplate done, we can focus on manually deriving our button.

derivedbutton.h

1
2
3
4
5
6
7
8
9
10
11
#ifndef GTKMM_EXAMPLE_DERIVED_BUTTON_H
#define GTKMM_EXAMPLE_DERIVED_BUTTON_H

#include "customclass.h"
#include <gtkmm.h>

extern "C" {
// C types
struct ExampleDerivedButton;
struct ExampleDerivedButton_Class;
}

We will first define two opaque types as if they were the original C types for our GObject. We will use those later.

We will first make a forward declaration to the C++ class that represents the GObject class and then we can define the C++ class that represents the GObject instances.

derivedbutton.h

28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class DerivedButton_Class;

class DerivedButton : public Gtk::Button {
public:
  DerivedButton(ExampleDerivedButton *object);
  DerivedButton(BaseObjectType *cobject, const Glib::RefPtr<Gtk::Builder> &);
  virtual ~DerivedButton();

  static GType get_type();
  static GType get_base_type();

  ExampleDerivedButton *gobj() const {
    return reinterpret_cast<ExampleDerivedButton *>(gobject_);
  }

  Glib::PropertyProxy<Glib::ustring> property_ustring() {
    return Glib::PropertyProxy<Glib::ustring>(this, "button-ustring");
  }
  Glib::PropertyProxy<int> property_int() {
    return Glib::PropertyProxy<int>(this, "button-int");
  }

  static DerivedButton *wrap(GObject *object, bool take_copy = false);

private:
  friend DerivedButton_Class;
  static DerivedButton_Class derived_button_class;

  static void instance_init_function(GTypeInstance *instance, void *g_class);

  void on_ustring_changed();
  void on_int_changed();

  static void set_property(GObject *object, guint property_id,
                           const GValue *value, GParamSpec *pspec);
  static void get_property(GObject *object, guint property_id, GValue *value,
                           GParamSpec *pspec);

  Glib::ustring button_ustring;

  int button_int;
};

Now the class.

derivedbutton.h

71
72
73
74
75
76
77
78
79
80
81
class DerivedButton_Class : public Glib::CustomClass {
private:
public:
  friend class DerivedButton;
  const Glib::Class &init();
  static void class_init_function(void *g_class, void *class_data);

  static Glib::ObjectBase *wrap_new(GObject *object);
};

#endif // GTKMM_EXAMPLE_DERIVED_BUTTON_H

`DerivedButton_Class` implementation

There is a lot to unpack in the header above. I think, however that it is easier to start from the class DerivedButton_Class. First note the static data member derived_button_class in line 54 of DerivedButton class. This will represent the GObject class and it will be used by DerivedButton to register the type. This happens because we will obtain a reference of a Glib::Class via the DerivedButton_Class::init.

derivedbutton.cc

135
136
137
138
139
140
141
142
143
144
const Glib::Class &DerivedButton_Class::init() {
  if (!gtype_) {
    class_init_func_ = DerivedButton_Class::class_init_function;
    register_derived_type(DerivedButton::get_base_type(),
                          DerivedButton::instance_init_function, "MyButton");
    Glib::init();
    Glib::wrap_register(gtype_, &wrap_new);
  }
  return *this;
}

gtype_ is a data-member inherited from Glib::Class. If zero it means the class needs registration, so we do this. We set DerivedButton_Class::class_init_function as the class initialisation function (field class_init_func_ is also inherited and used in our CustomClass::register_derived_type defined earlier). For simplicity of the implementation, though this could be done better we invoke Glib::init that will initialise all the internal machinery from glibmm and then we link this new type with DerivedButton_Class::wrap_new. Recall that glibmm wraps GObjects with a C++ object so it needs to link both, here we link this type with the creation function. The creation function looks like this

derivedbutton.cc

146
147
148
Glib::ObjectBase *DerivedButton_Class::wrap_new(GObject *object) {
  return new DerivedButton((ExampleDerivedButton *)object);
}

Finally when an object of the class is instantiated for the first time our class initialisation function (DerivedButton_Class::class_init_function) will be invoked. It looks like this.

derivedbutton.cc

151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
void DerivedButton_Class::class_init_function(void *g_class, void *class_data) {
  g_print("%s\n", __PRETTY_FUNCTION__);
  auto *const gobject_class = static_cast<GObjectClass *>(g_class);

  gobject_class->get_property = DerivedButton::get_property;
  gobject_class->set_property = DerivedButton::set_property;

  g_object_class_install_property(
      gobject_class, PROPERTY_INT,
      g_param_spec_int(
          "button-int", "", "", G_MININT, G_MAXINT, 0,
          static_cast<GParamFlags>(G_PARAM_READWRITE | G_PARAM_CONSTRUCT)));
  g_object_class_install_property(
      gobject_class, PROPERTY_STRING,
      g_param_spec_string(
          "button-ustring", "", "", "",
          static_cast<GParamFlags>(G_PARAM_READWRITE | G_PARAM_CONSTRUCT)));

  const auto cpp_class = static_cast<Gtk::Button_Class *>(g_class);
  Gtk::Button_Class::class_init_function(cpp_class, class_data);
}

We basically install a couple of properties (using the C API, I don’t think we can do much better here) and then we proceed to initialise the base class, in our case Gtk::Button. PROPERTY_INT and PROPERTY_STRING are a couple of enumerators that we use to identify these properties in this class.

derivedbutton.cc

66
67
68
69
70
enum PropertyId {
  INVALID_PROPERTY,
  PROPERTY_INT,
  PROPERTY_STRING,
};

This completes our implementation of the class. Note that we mention a couple of functons in DerivedButton to access the properties that we have just installed.

`DerivedButton` implementation

I’m going to list here only the functions that have changes.

derivedbutton.cc

32
33
34
35
36
37
38
39
DerivedButton::DerivedButton(BaseObjectType *cobject,
                             const Glib::RefPtr<Gtk::Builder> &)
    : Gtk::Button(cobject) {
  property_ustring().signal_changed().connect(
      sigc::mem_fun(*this, &DerivedButton::on_ustring_changed));
  property_int().signal_changed().connect(
      sigc::mem_fun(*this, &DerivedButton::on_int_changed));
}

The constructor that can be invoked by the builder is almost the same, it does not have to invoke the constructor of ObjectBase in any special way.

Ideally we would use this constructor, but it turns out that we may build the wrapping C++ object earlier. So let’s add one constructor for this case.

derivedbutton.cc

41
42
43
44
45
46
47
DerivedButton::DerivedButton(ExampleDerivedButton *obj)
    : Gtk::Button((GtkButton *)obj) {
  property_ustring().signal_changed().connect(
      sigc::mem_fun(*this, &DerivedButton::on_ustring_changed));
  property_int().signal_changed().connect(
      sigc::mem_fun(*this, &DerivedButton::on_int_changed));
}

Needless to say that, even if I did not do here, we can factor out the body of the constructor.

One of the functions that GObject requires is an instance initialisation function but ours does not have to do anything special because we will keep the state in the C++ object and not in the GObject itself.

derivedbutton.cc

51
52
53
54
void DerivedButton::instance_init_function(GTypeInstance *instance,
                                           void * /* g_class */) {
  // Does nothing.
}

There are two functions used when registering the GObject class in DerivedButton_Class. Those return GTypes which is the way GObject uses to identify types (they are just integer handles). We need one for the current class (MyButton) and one for the base (GtkButton).

derivedbutton.cc

56
57
58
59
60
GType DerivedButton::get_type() {
  return derived_button_class.init().get_type();
}

GType DerivedButton::get_base_type() { return GTK_TYPE_BUTTON; }

When requesting the curerent type, this will register the type using the init member function of DerivedButton_Class.

Finally we need a function that knows how to wrap a C GObject representing our class (not the C++ one) into a C++ object, creating one if needed. This is done using Glib::wrap_auto. This function will invoke, if there is no C++ wrapper object for the GObject, the function DerivedButton_Class::wrap_new shown earlier and that we registered in glibmm when registering the new GObject class type.

derivedbutton.cc

62
63
64
DerivedButton *DerivedButton::wrap(GObject *object, bool take_copy) {
  return dynamic_cast<DerivedButton *>(Glib::wrap_auto(object, take_copy));
}

I mentioned earlier that we need a couple of functions to access the properties. We still need to implement them. Those functions are basically C interfaces but we can still use most of the time the glibmm wrappers.

derivedbutton.cc

72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
void DerivedButton::set_property(GObject *object, guint property_id,
                                 const GValue *value, GParamSpec *pspec) {
  DerivedButton *this_ = DerivedButton::wrap(object);
  g_assert(this_);

  switch (property_id) {
  case PROPERTY_INT: {
    Glib::Value<int> v;
    v.init(value);
    int new_val = v.get();
    if (new_val != this_->button_int) {
      this_->button_int = new_val;
      g_object_notify_by_pspec(object, pspec);
    }
    break;
  }
  case PROPERTY_STRING: {
    Glib::Value<Glib::ustring> v;
    v.init(value);
    Glib::ustring new_val = v.get();
    if (new_val != this_->button_ustring) {
      this_->button_ustring = v.get();
      g_object_notify_by_pspec(object, pspec);
    }
    break;
  }
  default: {
    G_OBJECT_WARN_INVALID_PROPERTY_ID(object, property_id, pspec);
    break;
  }
  }
}

void DerivedButton::get_property(GObject *object, guint property_id,
                                 GValue *value, GParamSpec *pspec) {
  DerivedButton *this_ = DerivedButton::wrap(object);
  g_assert(this_);

  switch (property_id) {
  case PROPERTY_INT: {
    Glib::Value<int> v;
    v.init(v.value_type());
    v.set(this_->button_int);
    g_value_copy(v.gobj(), value);
    break;
  }
  case PROPERTY_STRING: {
    Glib::Value<Glib::ustring> v;
    v.init(v.value_type());
    v.set(this_->button_ustring);
    g_value_copy(v.gobj(), value);
    break;
  }
  default: {
    G_OBJECT_WARN_INVALID_PROPERTY_ID(object, property_id, pspec);
    break;
  }
  }
}

There is an interesting trivia fact here, is that Glib::Property<T> as provided by glibmm installs the properties when creating of the object wrapper while we have installed them when creating the class.

Another difference, is that glibmm’s generic function to get and set properties will always notify about changes even if the property is set to the previous value it held. We show a simple way to implement a more precise mechanism here.

Another interesting fact that happens here, is that the call to DerivedButton::wrap happens while initialising the GObject via Gtk.Builder, this means that we will invoke the new constructor we added and that the previous one we had, will not be invoked because when the DerivedWindow class tries to obtain the derived button, the wrapper object will exist already, so the constructor we had will not actually run.

Registering the type

Finally we need to make sure the type exists. We do that by registering it at the beginning of the application, in the same place were before we had to create a dummy instance instead.

main.cc

28
29
30
31
32
void on_app_activate() {
  // Make sure the type has been registered.
  g_type_ensure(DerivedButton::get_type());
  // ...
}

Discussion

When writing the wrapper manually, we need a moderate amount of boilerplate. In defense of gtkmm, though, the boilerplate is more or less at the level of what one usually needs when implementing GObjects in C. Also a few things cannot be done in C++ (because glibmm does not wrap much on the side of the classes) so we end invoking C interfaces.

One interesting thing we have not addressed are signals, unfortunately signals require the creation of a function that marshalls correctly the parameters. I think some C++ template pixie dust can help here, but the function must exist. Adding new signals is, thus, not trivial.

Finally, one thing that may not be obvious, is that the GObject will always entail the existence of a C++ wrapper. This is a fundamental aspect of glibmm, so while we can implement a full-fledged GObject, it will always require its C++ counterpart around.

Conclusion

Given the seamless integration between C and C++, it is relatively straightforward to fully write a new GObject using C++. The recommended approach in the gtkmm documentation has the downside it requires a default constructor (imposing this requirement to the base class) and creating a dummy object that will cause the registration of the new GObject class.

When written manually, the amount of boilerplace is significant and given that glibmm does not wrap much the C API for classes itself, we find ourselves forced to use GObject C interfaces.

All in all, I believe the recommended approach is more reasonable as long as we understand the nuance with the registration of the derived GObject class.

Wrapping GObjects in C++

2023-01-15T06:55:00+00:00

GObject is the foundational dynamic type system implemented on top of the C language that is used by many other libraries like GLib, GTK and many other components, most of them part of the GNOME desktop environment stack.

I’ve been lately wrapping a C library that uses GObject for C++ and I learned about some of the challenges.

GObject

Any general programming language can be used under the Object Oriented Programming (OOP) paradigm, and the difference between them is whether the language offers built-in support for that or not. So, when we say that Java is OOP we basically mean that the language has concepts which are meant to support this paradigm out of the box.

C is not one of those languages.

For reasons lost in the mist of time, related to the origins of the GNU Image Manipulation Program, the GTK toolkit, a GUI toolkit, was written in C. And its foundations are built on top of a library called GLib. GLib provides GObject: a library based OOP type system built on top of C. GTK and other libraries, part of the GNOME Desktop software stack, are built on top of GObject.

Now, GObject is powerful (just read about it but it also acknowledges the fact that there are more programming languages than just C, even if C serves as the common denominator here.

This is also the current reality: C these days can be seen as an interoperable layer between programming languages. Most foreign-function interfaces (foreign as in “written in another programming language”) target C as the interoperable layer. There are technical reasons for that fact, which are out of scope of this blog.

C++ is not, strictly, a superset of C but it can interoperate with C very, very easily (the C heritage in C++ enables this and also fuels many pain points of C++ itself). And C++, even if it has been dubbed as “multi paradigm”, has reasonable support for OOP.

So it makes sense to provide a C++ interface to GObject.

Wrapping on top of glibmm

GLib is the library that contains GObject and there already exists a C++ version of it called glibmm.

glibmm, along with another component called mm-common, allows systematically wrapping GObject-based C libraries in a consistent and coherent way. This is achieved using a tool called gmmproc. I used this approach for my wrap of libadwaitamm.

There are some design decisions made by glibmm that permeate and impact the wrappers.

Classes and objects

Because GObject is actually a library and implements an OOP type system, all the concepts of such system must exist as entities of the program. When working on a typical OOP language like C++ or Java, the concept of “class” is a concept provided and supported by the language itself.

This is not the case in GObject. Classes are entities represented in the memory of the program like regular data.

In fact when reading the GObject tutorial you will identify lots of steps required to register (or bring up) a class in GObject. GObject programmers identify that some of those steps are annoying and feel like boilerplate. To ease the pain they use C macros so the GObject classes can be declared and defined in a more convenient way.

Toshio Sekiya made this excellent GObject tutorial in C that is worth checking.

Once a class has been registered in GObject, we can instantiate it.

glibmm tries to make the use of GObject instances as convenient as regular C++ objects so it combines the class registration in GObject with the instantiation of a GObject class.

This works most of the time but complicates the process because classes themselves do not have a “constructor” method in C++ (only instances do). These “class constructors” are used to register class-level attributes like signals and properties.

glibmm solves this problem by using a secondary class, which is automatically generated by the wrapping machinery, that represents the class itself. This class object is used as a singleton of the application and it is initialised upon the creation of the first instance of a GObject class. This initialisation can then invoke a function that can register properties, signals and interfaces implementations.

Signals

Signals in GObject are close to what in other programming languages (like C# or Java) are called delegates or listeners. It is possible to connect to a signal so a piece of code, as a callback, is executed when something happens. Signals can be arbitrarily defined by a GObject class so the GObject instance can emit those signals as needed.

glibmm was written in a pre-C++11 world and back then it used the libsigc++ library to ensure type-safety in the callbacks (something that C can’t do and it is sometimes [ab]used by the C libraries). This library is still very useful these days, but in a post-C++11 world some of the heavy lifting can be delegated to the C++ standard library itself.

libsigc++ provides two concepts: signals (something that can be emitted) and slots (something that can be connected to a signal and will be invoked when the signal is emitted). Because libsigc++ is generic and not tied to glibmm (even if it is, maybe, one of its biggest users), the glibmm wrapping machinery has to translate a signal callback (a C callback) into a proper libsigc++ slot. Luckily, almost all callbacks in GObject are closures that receive a void* argument where anything related to the context can be passed to the callback. This way, when wrapping a GObject implemented in C, the wrapping machinery connects the existing (GObject’s) signals to a callback (a free function, typically generated) that unwraps the context pointer into libsigc++’s slots for that libsigc++ signal.

Properties

Many OOP programming languages (like C# or Object Pascal) have the concept of “properties”. They look like object attributes (fields) but can invoke a function when reading or writing the attribute.

GObject properties follow this philosophy and introduce a couple of extra features: properties have a (GObject) signal associated to them that can be used to signal updates to the property and can be generically read and written using GObject generic mechanisms. These two features allow properties to be bound to other properties and build expressive GUIs with reasonable effort.

For instance, if we have a hypothetical list widget with a property number-of-elements, we can bind this property to the sensitive property of a Gtk.Button intended to clear that list widget. This way, we can enable or disable the button based on whether the list widget contains items. More complex scenarios are possible using Gtk.Expression.

Properties are implemented in GObject with two callbacks that are invoked when a property is read or written, respectively.

The challenge of subclassing

Now, if our goal was to only wrap existing GObjects, a scenario that all the machinery of glibmm supports very well, we would be done.

Although the GObject type system allows to introduce new fundamental types (which are mostly meant to represent built-in language types such as int or double), most of the new types defined by a library or application are created by means of subclassing (if indirectly) the GObject.Object class type itself.

Now, subclassing a class in GObject means registering a class and letting the registration procedure know the parent class (GObject, like Java or C# but in contrast to C++, allows only single base class). This process would be burdensome given that the additional class that represents the class is a bit of a pain to write. The glibmm mechanism of a separate class that represents the class entity in GObject is not super convenient to write manually.

So in that line glibmm devised a convenient mechanism in which by using the regular C++ inheritance one could create a new class almost transparently.

Subclassing is magic

Consider that you want to subclass Gtk::Button.

You can just do

class MyButton : public Gtk::Button {
 public:
  MyButton(const Glib::ustring &label) : Gtk::Button(label) {}
  // ...
};

And that’s it. No need for a separate MyButton_Class or the likes that represents the GObject class itself. Cool, but how does this work?

gmmproc-wrapped classes always register a derived class that just clones the original wrapped class. In the case of Gtk::Button, the original C class is GtkButton. The wrapped code registers (just once) a gtkmm__GtkButton class in the GObject typesystem and makes it a subclass of GtkButton. The reason why this is done is in order to allow implementing a virtual method mechanism, explained below.

Note, however, that no class is registered in GObject for MyButton. At the eyes of GObject any instance of MyButton is just a gtkmm__GtkButton.

Virtual methods

GObject would not be a complete OOP mechanism if it did not support polymorphism via virtual table classes. In the C implementation, virtual methods are implemented as pointers to functions and those are overriden explicitly by subclasses in the “class constructor” by setting them to point to specific functions.

Virtual methods are exposed as a convenience in gmmproc-wrapped classes as regular C++ virtual methods. To make this work, however, the class must have had to overriden the GObject virtual method so it ultimately calls the C++ virtual method. This can only happen in the “class constructor”. By subclassing with a wrapper that introduces no extra data, gmmproc-wrapped classes can override GObject virtual methods at will.

This is exactly what happens with Gtk.Button.clicked virtual method. When initialising the class gtkmm__GtkButton this virtual method is made to invoke a C++ virtual method (generated by gmmproc) called on_clicked. If the method is not actually overridden in the subclass, gmmproc calls the current virtual method implementation (if any).

class MyButton : public Gtk::Button {
 public:
  MyButton(const Glib::ustring &label) : Gtk::Button(label) {}

  virtual on_clicked() override {
    // ...
  }
};

Properties

But if we did not create a new GObject class to represent MyButton and we’re just using C++ owns mechanism for virtual methods, what about new signals or properties we might want to add?

This is where this convenient scheme of inheriting, one that does not require a description of the class, starts showing its limits.

First we need to make sure the new class is actually a new one. This can be achieved using a different constructor of Glib::ObjectBase. While the root of the hierarchy is Glib::Object (it wraps GObject.Object), Glib::ObjectBase is a virtual base of Glib::Object that is used to change some of the behaviour when creating Glib::Object. Glib::ObjectBase has a constructor where you can specify a class name.

class MyButton : public Gtk::Button
{
 public:
  MyButton(const Glib::ustring &label) :
    Glib::ObjectBase("MyButton"),
    Gtk::Button(label) {}
  // ...
};

When using this constructor, glibmm will register a new class gtkmm__CustomObject_MyButton. And this allow us to define properties.

class MyButton : public Gtk::Button {
 public:
  MyButton(const Glib::ustring &label)
      : Glib::ObjectBase("MyButton"), Gtk::Button(str) {}

  Glib::Property<int> my_value{*this, "my-value", 0};
};

Now, properties are class-level attributes so ideally those should be registered (installed) in the class constructor, which we cannot access. However, GObject allows installing properties later and this is what happens when executing the constructor of the property my_value that is run as part of the constructor of MyButton.

Signals

What about signals? Unfortunately, as far as I can tell, there is no straightforward way to install new custom GObject signals.

Note that libsigc++ can be used in some signalling scenarios as an alternative to GObject signals. This is because, in contrast to properties, GObject signals do not seem to be composable between them. So we may only need a thing that acts like a wrapped signal even if it is not a proper GObject signal.

If we do want a GObject signal, one thing we can do is using Glib::ExtraClassInit which allows us to define our own class initialisation function. But note that this will be executed the first time we instantiate our class. This fragile (at least to me) behaviour is again part the price we pay for not decoupling the C++ class that represents instances from the C++ class that represents the GObject class itself.

Why would we want to use C++ to write a GObject?

If we look at the wrapper libraries as a mean to write C++, one might think that we only need the minimal wrapping surface and then be able to use C++, outside of GObject, to develop the rest of the functionality.

While I do not think is super essential to be able to write a GObject in C++ so it can be called from outside C++ (this would force us to provide a C interface anyways), I think it is useful to be able to bring up a GObject in C++ so it can be used in some of the convenient machinery that GTK provides: mainly .ui files and Gtk.Builder.

Now, .ui files are very powerful and can do lots of things for us in a convenient way. But this can only happen if the GTK library sees a full-fledged GObject. The class type must have been registered in GObject and its properties, signals and interfaces must have been registered during class initialisationn (not later, like glibmm allows us to do).

And I would like to use C++ to do that, as much as possible. So in a next post I will explore some approaches I have been using in my projects.

Bisecting flaky tests with rspec and GitHub Actions

2022-08-04T00:00:00+00:00

Ah, those good, old flaky test suites! Sooner or later you’ll encounter one of them. They are test suites that sometimes pass, sometimes fail, depending on certain environmental conditions. A lot has been written about flaky tests and what causes them, but in this post I’d like to discuss a specific type of flaky test –order dependant test failures–, and how to help debug them using GitHub Actions as part of your CI/CD pipelines.

Order dependant test failures

An order dependant test failure is one that happens when:

There is more than one test being run as part of the suite.
One of the test fails only when the suite is run in a specific order.

Let’s simplify things and assume you have a very small test suite consisting of two tests: Test A and Test B. This post will assume we’re using ruby as our language of choice, and rspec as our testing framework, however the fundamentals apply to any other language and good testing framework. In this case, we might be dealing with a situation like this:

When we run Test A, it passes.
When we run Test B, it passes.
When we run Test A and Test B, they both pass.
When we run Test B and Test A, Test B passes but Test A fails.

If using rspec in its default configuration, you are probably running your test suite in a random order. This makes rspec generate a random seed and use that seed to determine in which order tests should be run. When running the above test suite using rspec in a random order, you can expect your suite to break roughly 50% of the times.

However, order dependant test failures can be very pernicious because they are introduced silently, they can make your test suite fail only occasionally, which leads to developers being lazy and use the retry the tests until they pass technique. The bogus test doesn’t get dealt with until it’s too late: the test suite now fails often, causing delays in releases, frustration, or even panic situations when the need for a quick release arises: there’s nothing worse than having to hotfix a production issue quickly and not being able to because your test suite keeps failing.

Bisect to the rescue

One of the features of rspec is the ability to run a bisect. Once you discover an order dependant failure and can consistently reproduce it with a fixed seed, it can still be difficult to determine which test is causing the issue. In our example we only have 2 tests, but in bigger test suites the failing test might be executed after other hundreds of tests, making it hard to determine which one of them is the bad apple. Bisect solves that problem by consistently running all your tests to try and determine the minimal set of examples that reproduce the same failures. The way rou run bisect is by providing rspec with the exact same options and seed that caused the order dependant failure, and adding the --bisectflag to the CLI. Interally bisect will split tests into two chunks, run those tests, discard the chunk that does not fail, and carry on recursively until the smallest failing number of tests is found.

Our example

I have created a proof of concept gem with a test suite that has an order dependant failure. The repository can be checked at brafales/flaky_specs_poc.

If you’re not interested in the nitty gritty of why this particular test suite is problematic and are only interested in the GitHub Actions Workflow file, please skip this section.

The problematic spec in this gem is spec/flaky_specs_poc/job_two_spec.rb. This proof of concept uses Sidekiq to show a common testing issue with this popular background job processing framework.

Sidekiq works on the basis of jobs, which get pushed into a queue, and then picked up by a worker process. Sidekiq will use a backend to store jobs, for example a redis instance; however, when running your tests you might not want to have to mess around with having a redis instance available for use. For this reason, Sidekiq in test mode uses a virtual backend which will queue jobs in memory, and doesn’t process them by default.

If you want to test a bit of code that queues a Sidekiq job, you do it like this:

# frozen_string_literal: true

require "sidekiq/testing"

RSpec.describe FlakySpecsPoc::JobOne do
  it "queues an HttpJob" do
    expect do
      subject.perform
    end.to change(FlakySpecsPoc::HttpJob.jobs, :size).by(1)
  end
end

This is a good way to check that your code did the right thing (queue a job) without having to worry about the specifics about what that job does. It’s essentially the same as mocking a third party HTTP request.

However, sometimes you might want to know not only that a job was queued, but also that a certain side effect of that job having run took place. One might argue that this is a bad test since we should not be testing for side effects, but the reality is these kind of tests (especially feature or end to end tests) are ubiquitous. For this, Sidekiq provides a special method that allows you to run jobs that get queued immediately, in an in-line fashion. This method can be used in two ways:

With a block, where inline test mode will be enabled for the code that runs inside the block, and disabled once the code in the block has been executed.
Without a block, which enables inline testing globally.

And it’s very easy to do something like this in a spec where you want your Sidekiq jobs to run inline:

# frozen_string_literal: true

require "sidekiq/testing"

RSpec.describe FlakySpecsPoc::JobOne do
  before do
    Sidekiq::Testing.inline!
  end

  it "checks something done by the HttpJob" do
    VCR.use_cassette("job_one") do
      subject.perform
    end
    expect(true).to eq(true)
  end
end

What the code above will do when run is to enable Sidekiq inline testing and leaving it on for the rest of the test suite execution. The problem with this is that if another test after this runs and queues a Sidekiq job, that job will be run inline instead of being queued in memory. If that test does not expect that, it’ll fail only if run after the first test.

I’ve recreated this scenario in my gem by having a spec that tests that a job is queued, then having a spec that mistakenly enables inline testing for Sidekiq globally, and finally by having the Sidekiq job that gets queued make an HTTP request. I’m using VCR to record and then mock external HTTP calls.

So what happens is the following:

If the test that checks if a job is queued runs first, it passes, because no external HTTP calls are made, since the Sidekiq job simply gets queued in memory, but never executed inline.
If the test that sets inline testing runs first, then when the other test runs after it, the Sidekiq job will run, make an HTTP call and cause a failure since VCR does not expect that external call to be made.

For reference, this is what a correct way to write this spec is:

# frozen_string_literal: true

require "sidekiq/testing"

RSpec.describe FlakySpecsPoc::JobOne do
  around do |spec|
    Sidekiq::Testing.inline! do
      spec.call
    end
  end

  it "checks something done by the HttpJob" do
    VCR.use_cassette("job_one") do
      subject.perform
    end
    expect(true).to eq(true)
  end
end

You can easily recreate this by running the following command on the gem source code:

bundle exec rspec --order=rand --seed=55702

Which should give you this output:

Randomized with seed 55702

FlakySpecsPoc::JobOne
  checks something done by the HttpJob

FlakySpecsPoc::HttpJob
  gets a response from a server

FlakySpecsPoc::JobOne
  queues an HttpJob (FAILED - 1)

Failures:

  1) FlakySpecsPoc::JobOne queues an HttpJob
     Failure/Error: res = Net::HTTP.get_response(uri)

     VCR::Errors::UnhandledHTTPRequestError:


       ================================================================================
       An HTTP request has been made that VCR does not know how to handle:
         GET https://reqbin.com/echo/get/json

       There is currently no cassette in use. There are a few ways
       you can configure VCR to handle this request:

         * If you're surprised VCR is raising this error
           and want insight about how VCR attempted to handle the request,
           you can use the debug_logger configuration option to log more details [1].
         * If you want VCR to record this request and play it back during future test
           runs, you should wrap your test (or this portion of your test) in a
           `VCR.use_cassette` block [2].
         * If you only want VCR to handle requests made while a cassette is in use,
           configure `allow_http_connections_when_no_cassette = true`. VCR will
           ignore this request since it is made when there is no cassette [3].
         * If you want VCR to ignore this request (and others like it), you can
           set an `ignore_request` callback [4].

       [1] https://www.relishapp.com/vcr/vcr/v/6-1-0/docs/configuration/debug-logging
       [2] https://www.relishapp.com/vcr/vcr/v/6-1-0/docs/getting-started
       [3] https://www.relishapp.com/vcr/vcr/v/6-1-0/docs/configuration/allow-http-connections-when-no-cassette
       [4] https://www.relishapp.com/vcr/vcr/v/6-1-0/docs/configuration/ignore-request
       ================================================================================
     # ./lib/flaky_specs_poc/http_job.rb:13:in `perform'
     # ./lib/flaky_specs_poc/job_one.rb:10:in `perform'
     # ./spec/flaky_specs_poc/job_one_spec.rb:8:in `block (3 levels) in <top (required)>'
     # ./spec/flaky_specs_poc/job_one_spec.rb:7:in `block (2 levels) in <top (required)>'

Finished in 0.03305 seconds (files took 0.85052 seconds to load)
3 examples, 1 failure

Failed examples:

rspec ./spec/flaky_specs_poc/job_one_spec.rb:6 # FlakySpecsPoc::JobOne queues an HttpJob

Randomized with seed 55702

Run the same test suite with a different seed though:

bundle exec rspec --order=rand --seed=3164

And everything’s good:

Randomized with seed 3164

FlakySpecsPoc::HttpJob
  gets a response from a server

FlakySpecsPoc::JobOne
  queues an HttpJob

FlakySpecsPoc::JobOne
  checks something done by the HttpJob

Finished in 0.02717 seconds (files took 0.39785 seconds to load)
3 examples, 0 failures

Randomized with seed 3164

In this case, given we have very little tests, this could be relatively easy to debug manually, but with a bigger test suite we can use rspect bisect:

bundle exec rspec --order=rand --seed=55702 --bisect

Which will give us the following:

Bisect started using options: "--order=rand --seed=55702"
Running suite to find failures... (0.10595 seconds)
Starting bisect with 1 failing example and 2 non-failing examples.
Checking that failure(s) are order-dependent... failure appears to be order-dependent

Round 1: bisecting over non-failing examples 1-2 .. ignoring example 2 (0.19095 seconds)
Bisect complete! Reduced necessary non-failing examples from 2 to 1 in 0.25318 seconds.

The minimal reproduction command is:
  rspec './spec/flaky_specs_poc/job_one_spec.rb[1:1]' './spec/flaky_specs_poc/job_two_spec.rb[1:1]' --order=rand --seed=55702

And now we know how to consistently reproduce the error with the minimum number of tests, which will make pinpointing the sneaky bogus test easier.

Automating bisects

The next step is clear: automate it! I’m going to show you a GitHub Actions Workflow that will automatically run a bisect on a failing test suite.

First of all a couple of disclaimers:

This has not been productionised, so as usual, use at your own risk ;)
This flow does a bisect on failing test suites. This will make your test pipeline slower, since a bunch of failing tests will be run twice, including failures which are not caused by flaky tests!

Here’s the complete flow:

name: Ruby

on:
  push:
    branches:
      - main

  pull_request:

jobs:
  RunTests:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Ruby
      uses: ruby/setup-ruby@v1
      with:
        bundler-cache: true
    - name: Run the tests
      id: tests
      continue-on-error: true
      run: bundle exec rspec --order=rand -f j -o tmp/rspec_results.json
    - name: Bisect flaky specs
      if: steps.tests.outcome != 'success'
      run: bundle exec rspec --order=rand --seed $(cat tmp/rspec_results.json | jq '.seed') --bisect

The first bit of the flow is a pretty standard way of doing things. The bits that interest us are the Run the testsand Bisect flaky specs steps.

This step will run our tests:

- name: Run the tests
  id: tests
  continue-on-error: true
  run: bundle exec rspec --order=rand -f j -o tmp/rspec_results.json

--order=rand will ensure the suite is run in random order.
-f j will make sure the output of the tests is in JSON format. This is important since we need to be able to parse the test results easily.
-o tmp/rspec_results.json sends the results into a file instead of STDOUT.
We also use continue-on-error: true to tell GitHub Actions that when the tests fail, the rest of the steps will still be executed, otherwise on a test failure the flow would immediately end.

And this is the step that will run a bisect:

- name: Bisect flaky specs
  if: steps.tests.outcome != 'success'
  run: bundle exec rspec --order=rand --seed $(cat tmp/rspec_results.json | jq '.seed') --bisect

A few noteworthy bits:

if: steps.tests.outcome != 'success' will ensure this step is only run if the original test suite failed.
We use cat tmp/rspec_results.json | jq '.seed' to get the seed that was originally used to run the tests, so we can pass it to the bisect.

For reference, this is what an rspec result in JSON format looks like:

{
    "version": "3.11.0",
    "seed": 55702,
    "examples": [
        {
            "id": "./spec/flaky_specs_poc/job_two_spec.rb[1:1]",
            "description": "checks something done by the HttpJob",
            "full_description": "FlakySpecsPoc::JobOne checks something done by the HttpJob",
            "status": "passed",
            "file_path": "./spec/flaky_specs_poc/job_two_spec.rb",
            "line_number": 16,
            "run_time": 0.009731,
            "pending_message": null
        },
        {
            "id": "./spec/flaky_specs_poc/http_job_spec.rb[1:1]",
            "description": "gets a response from a server",
            "full_description": "FlakySpecsPoc::HttpJob gets a response from a server",
            "status": "passed",
            "file_path": "./spec/flaky_specs_poc/http_job_spec.rb",
            "line_number": 4,
            "run_time": 0.003383,
            "pending_message": null
        },
        {
            "id": "./spec/flaky_specs_poc/job_one_spec.rb[1:1]",
            "description": "queues an HttpJob",
            "full_description": "FlakySpecsPoc::JobOne queues an HttpJob",
            "status": "failed",
            "file_path": "./spec/flaky_specs_poc/job_one_spec.rb",
            "line_number": 6,
            "run_time": 0.021981,
            "pending_message": null,
            "exception": {
                "class": "VCR::Errors::UnhandledHTTPRequestError",
                "message": "\n\n================================================================================\nAn HTTP request has been made that VCR does not know how to handle:\n  GET https://reqbin.com/echo/get/json\n\nThere is currently no cassette in use. There are a few ways\nyou can configure VCR to handle this request:\n\n  * If you're surprised VCR is raising this error\n    and want insight about how VCR attempted to handle the request,\n    you can use the debug_logger configuration option to log more details [1].\n  * If you want VCR to record this request and play it back during future test\n    runs, you should wrap your test (or this portion of your test) in a\n    `VCR.use_cassette` block [2].\n  * If you only want VCR to handle requests made while a cassette is in use,\n    configure `allow_http_connections_when_no_cassette = true`. VCR will\n    ignore this request since it is made when there is no cassette [3].\n  * If you want VCR to ignore this request (and others like it), you can\n    set an `ignore_request` callback [4].\n\n[1] https://www.relishapp.com/vcr/vcr/v/6-1-0/docs/configuration/debug-logging\n[2] https://www.relishapp.com/vcr/vcr/v/6-1-0/docs/getting-started\n[3] https://www.relishapp.com/vcr/vcr/v/6-1-0/docs/configuration/allow-http-connections-when-no-cassette\n[4] https://www.relishapp.com/vcr/vcr/v/6-1-0/docs/configuration/ignore-request\n================================================================================\n\n",
                "backtrace": [
                    "REDACTED FOR LEGIBILITY"
                ]
            }
        }
    ],
    "summary": {
        "duration": 0.037856,
        "example_count": 3,
        "failure_count": 1,
        "pending_count": 0,
        "errors_outside_of_examples_count": 0
    },
    "summary_line": "3 examples, 1 failure"
}

What we do with this file is send it to the jq tool for parsing, and telling it to get us the value for top level key seed. jq is a really useful and powerful tool so I suggest you check it out if you’re unfamiliar with it.

Below you can see a screenshot of this flow successfully bisecting our example test suite.

Conclusions

In this post we have learned about a specific, pernicious test failure that manifests itself when a test suite is run in a specific order. We have then seen how a technique called bisecting can help determine what test of potentially many is causing te failure. Last but not least, we have shown a GitHub Actions Workflow that will automatically run the bisect task when a test suite fails to execute.

This is a very small, toy example of how to make this work. Your real life test suites are probably a lot more complex, bigger, and so this example might not work for you, but the fundamentals should be the same.

OpenSSH as a SOCKS server

2022-01-03T22:03:00+00:00

Sometimes we are given access via ssh to nodes that do not have, for policy or technical reasons, access to the internet (i.e. they cannot make outbound connections). Depending on the policies, we may be able to open reverse SSH tunnels, so things are not so bad.

Recently I discovered that OpenSSH comes with a SOCKS proxy server integrated. This is probably a well known feature of OpenSSH but I thought it was interesting to share how it can be used.

SOCKS

Nowadays, access to the Internet is ubiquitous and most of the time assumed as a fact. However, in some circumstances, direct access to the internet is not available or not desirable. In those cases we can resort on proxy servers that act as intermediaries between the Internet and the node without direct access.

Many tools used commonly assume one is connected to the Internet: package managers such as pip and cargo can automatically download the files required to install a package. If no outbound connection is possible, software deployment and installation becomes complicated.

However, most of the time, those tools only require HTTP/HTTPS support. So a proxy that only forwards HTTP and HTTPS requests is enough. Examples of these kind of proxies are tinyproxy and squid.

SOCKS, is a general proxy protocol that can be used for any TCP connection, not only those for HTTP/HTTPS. An interesting thing is that ssh comes with an integrated SOCKS proxy which is relatively easy to use. Often most tools that can use a HTTP/HTTPS proxy can also use a SOCKS proxy so this is a handy option to consider.

Example: Installing Rust through a proxy

If we try to install Rust on a machine that does not allow outbound connections, this is what happens. (Let’s ignore the question whether piping a download directly to the shell is a reasonable thing to do).

user@no-internet$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

This command will likely time out after a long time because outbound connections are silently dropped and the installation will fail.

Set up proxy server

To address this, let’s first open a SOCKS proxy using ssh on our local machine (with-internet). This machine must have internet access (change user to your username). ssh will request you to authenticate (via password or ssh key).

user@with-internet$ ssh -N -D 127.0.0.1:12345 user@localhost

The flag -N means not to execute a command and -D interface:port means to open the port bound to the interface. This is the SOCKS proxy. In this example we are opening port 12345 and binding it to the 127.0.0.1 (localhost) interface. We are using the same machine as the proxy, hence user@localhost (it is possible to use another node, but we don’t have to given that with-internet already can connect to the internet). This must stay running so you will have to open another terminal and set up the reverse tunnel.

To set up the reverse tunnel do the following.

user@with-internet$ ssh -R 127.0.0.1:9999:127.0.0.1:12345 -N user@no-internet

This opens the port 9999 in the host without internet (no-internet) and binds it to its localhost (i.e. the localhost of no-internet) then it tunnels it to the port 12345 bound to the interface 127.0.0.1 of our local node (with-internet). Again this will not run any command (due to -N) and the syntax of -R is -R remote-interface:remote-port:local-interface:local-port. Keep this command running.

Note: Because we are using an unprivileged port on no-internet and the -D option does not allow setting authentication, anyone in no-internet could proxy connections through with-internet. Do this only on a no-internet host you trust.

Proxy configuration

Now we can setup curl to use a socks proxy. We do this with the --proxy-option. For convenience we will first download the installation script into a file.

user@no-internet$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs \
                       --proxy socks5://localhost:9999 -o  install-rust.sh

We can do a quick check that it contains what we expect

user@no-internet$ head install-rust.sh 
#!/bin/sh
# shellcheck shell=dash

# This is just a little script that can be downloaded from the internet to
# install rustup. It just does platform detection, downloads the installer
# and runs it.

# It runs on Unix shells like {a,ba,da,k,z}sh. It uses the common `local`
# extension. Note: Most shells limit `local` to 1 var per line, contra bash.

Install Rust

We can set up https_proxy environment variable to point to the SOCKS server so it is used by the installation script.

user@no-internet$ export https_proxy=socks5://localhost:9999

Now we are read to install Rust using the script we downloaded.

user@no-internet $ bash install-rust.sh

info: downloading installer

Welcome to Rust!

This will download and install the official compiler for the Rust
programming language, and its package manager, Cargo.

Rustup metadata and toolchains will be installed into the Rustup
home directory, located at:

  /home/user/.rustup

This can be modified with the RUSTUP_HOME environment variable.

The Cargo home directory located at:

  /home/user/.cargo

This can be modified with the CARGO_HOME environment variable.

The cargo, rustc, rustup and other commands will be added to
Cargo's bin directory, located at:

  /home/user/.cargo/bin

This path will then be added to your PATH environment variable by
modifying the profile files located at:

  /home/user/.profile
  /home/user/.zshenv

You can uninstall at any time with rustup self uninstall and
these changes will be reverted.

Current installation options:


   default host triple: x86_64-unknown-linux-gnu
     default toolchain: stable (default)
               profile: default
  modify PATH variable: yes

1) Proceed with installation (default)
2) Customize installation
3) Cancel installation
>1

info: profile set to 'default'
info: default host triple is x86_64-unknown-linux-gnu
info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'
info: latest update on 2021-12-02, rust version 1.57.0 (f1edd0429 2021-11-29)
info: downloading component 'cargo'
info: downloading component 'clippy'
info: downloading component 'rust-docs'
info: downloading component 'rust-std'
 24.9 MiB /  24.9 MiB (100 %)  19.9 MiB/s in  1s ETA:  0s
info: downloading component 'rustc'
 53.9 MiB /  53.9 MiB (100 %)  20.1 MiB/s in  2s ETA:  0s
info: downloading component 'rustfmt'
info: installing component 'cargo'
info: installing component 'clippy'
info: installing component 'rust-docs'
  5.3 MiB /  17.9 MiB ( 29 %)   1.7 MiB/s in  6s ETA:  7s
...

Once Rust is installed, you can setup cargo so it always uses this proxy.

Example: Using pip using SOCKS

pip is used to install Python packages. Unfortunately pip does not support SOCKS by default. If you try to install yapf using the configuration above this happens:

user@no-internet$ pip install --proxy=socks5://localhost:9999 yapf
Collecting yapf
ERROR: Could not install packages due to an EnvironmentError: Missing dependencies for SOCKS support.

Based on this answer from Stack Overflow we need to first install pysocks. Now we have a chicken-and-egg situation that we need to solve: we cannot download pysocks on the no-internet machine! To solve it, download pysocks locally:

user@with-internet$ python3 -m pip download pysocks
Collecting pysocks
  Downloading PySocks-1.7.1-py3-none-any.whl (16 kB)
Saved ./PySocks-1.7.1-py3-none-any.whl
Successfully downloaded pysocks

Copy this python wheels file to no-internet, for instance using scp.

user@with-internet$ scp PySocks-1.7.1-py3-none-any.whl user@no-internet

And install it manually there. I’m installing it in the user environment (--user flag) because in this machine I don’t have enough permissions, but your mileage may vary here.

user@no-internet$ pip install --user PySocks-1.7.1-py3-none-any.whl 
Processing ./PySocks-1.7.1-py3-none-any.whl
Installing collected packages: PySocks
Successfully installed PySocks-1.7.1

If we use pip and SOCKS, now we succeed.

user@no-internet$ pip install --user --proxy=socks5://localhost:9999 yapf
Collecting yapf
  Downloading https://files.pythonhosted.org/packages/47/88/843c2e68f18a5879b4fbf37cb99fbabe1ffc4343b2e63191c8462235c008/yapf-0.32.0-py2.py3-none-any.whl (190kB)
     |████████████████████████████████| 194kB 933kB/s 
Installing collected packages: yapf
Successfully installed yapf-0.32.0

Yay!

Cleanup

Recall that we have two connections opened: one is the SOCKS proxy (-D) and the other the reverse tunnel (-R). Just end them both with Ctrl-C and you are done. I’m sure this can be scripted somehow but given that the ssh commands may require password input, this is not a trivial thing to do.

Distributed compilation in a cluster

2021-12-31T16:20:00+00:00

In software development there is an unavoidable trend in which applications become larger and more complex. For compiled programming languages one of the consequences is that their compilation takes longer.

Today I want to talk about using distcc to speed C/C++ compilation using different nodes in a scientific cluster.

Distributed compilation

Some programming languages, like C++, are slow to compile. Ideally, the root causes of the slowness would be attacked and we would call it a day. However, the real causes of the slowness are many and they are not trivial to solve. So as an alternative, for now, we can try to throw more resources at compiling C++.

distcc is a tool that helps us doing so by distributing a C/C++ compilation accross several nodes accessible via a network. Distcc has relatively low expectations about the nodes: ideally you only need the same compiler installed everywhere. This is because the default operation mode of distcc is based on distributing the preprocessed files. This works but we can do a bit better if we are able to also preprocess distributedly, distcc calls this the pump mode.

Scientific clusters

Scientific clusters are designed to execute applications that need lots of computational resources. As such they are usually structured as one or more login nodes and a set of computational nodes. Users can connect to login nodes but can only access computational nodes after they have allocated the resources. The allocation is requested from the login node. A common resource manager to do that is Slurm.

Example: compiling LLVM on 4 nodes

Install distcc

I’ll assume distcc is already installed in <install-distcc>. It is not difficult to install from source. Make sure the installation directory is in the PATH.

$ export PATH=<install-distcc>/bin:$PATH

Set up distcc

In general, using login nodes for anything other than allocating computational resources is frowned upon. So we will request 5 nodes. One of them, the main, will be used to orchestrate the compilation, and the other 4 will be used for the compilation itself.

For this example I’m going to use Slurm. Even if your cluster is using Slurm too, its site-configuration may be different and there may be small operational differences. Check your site documentation.

First of all, let’s request the resources using salloc command from Slurm.

$ salloc --qos=debug --time=01:00:00 -N 5
salloc: Pending job allocation 12345678
salloc: job 12345678 queued and waiting for resources
salloc: job 12345678 has been allocated resources
salloc: Granted job allocation 12345678
salloc: Waiting for resource configuration
salloc: Nodes z04r1b64,z05r2b[38-39],z09r2b56,z13r1b01 are ready for job

In the cluster I’m using, debug is a special partition meant to do debugging or compiling applications. I’m requesting 5 nodes for 1 hour.

The allocation is often fulfilled quickly (but this depends on the level of utilisation of the cluster, which often correlates with deadlines!).

The cluster I’m using automatically logins you to the first allocated node (z04r1b64). Some other clusters may require you to do ssh first.

We need to make sure distcc will allow us to use the compiler we plan to use. So edit the file <install-distcc>/etc/distcc/commands.allow.sh and add the full path to the compiler you want to use. In my case I will be using clang installed in a non-default path.

<install-distcc>/etc/distcc/commands.allow.sh

allowed_compilers="
  /usr/bin/cc
  /usr/bin/c++
  /usr/bin/c89
  /usr/bin/c99
  /usr/bin/gcc
  /usr/bin/g++
  /usr/bin/*gcc-*
  /usr/bin/*g++-*
  /apps/LLVM/12.0.1/GCC/bin/clang
  /apps/LLVM/12.0.1/GCC/bin/clang++
"

Now we are going to use clush to start the distcc daemon on the other nodes.

$ clush -w "z05r2b[38-39],z09r2b56,z13r1b01" \
        <install-distcc>/bin/distccd --daemon --allow 10.0.0.0/8 --enable-tcp-insecure

Luckily, clush understands Slurm’s nodeset notation, so we can just use it directly (but make sure you do not pass the main node). We request it to be a daemon --daemon and allow all the nodes of the private LAN used by the cluster. The flag --enable-tcp-insecure is required because, for simplicity, we will not use the masquerade feature of distcc. This is a safety feature that should be considered later.

At this point we must setup the DISTCC_HOSTS environment variable. Unfortunately distcc cannot use Slurm’s nodeset notation. A tool called nodeset (from clush itself) will come handy here.

$ nodeset -e z04r1b64,z05r2b[38-39],z09r2b56,s13r1b01
z05r2b38 z05r2b39 z04r1b64 z09r2b56 s13r1b01

The variable DISTCC_HOSTS has the following syntax for each host: host/limit,options. We are going to use as options lzo,cpp. lzo means compressing the files during network transfers and cpp will allow us to enable the pump mode. Each of our nodes has 48 cores, so let’s use that as a limit (distcc’s default is 4 concurrent jobs). Also let’s not forget to remove the main node. We can use the following script.

setup_distcc.sh

#!/usr/bin/env bash

CURRENT=$(hostname)
DISTCC_HOSTS=""
NODES="$(nodeset -e ${SLURM_NODELIST})"

for i in ${NODES}
do
  # Ignore current host.
  if [ "$i" = "${CURRENT}" ];
  then
   continue;
  fi;
  if [ -n "${DISTCC_HOSTS}" ];
  then
    # Add separator.
    DISTCC_HOSTS+=" "
  fi
  DISTCC_HOSTS+="$i/48,lzo,cpp"
done

export DISTCC_HOSTS

$ source set_distcc_hosts.sh
$ echo $DISTCC_HOSTS
z05r2b3/48,lzo,cpp z04r1b6/48,lzo,cpp z09r2b5/48,lzo,cpp s13r1b0/48,lzo,cpp

Now we are ready to build.

LLVM’s cmake

In a separate build directory, invoke cmake as usual. Make sure you specify the compiler you want to use. The easiest way to do this is setting CC and CXX.

We will compile only clang in this example.

# Source of llvm is in ./llvm-src
$ mkdir llvm-build
$ cd llvm-build
$ CC="/apps/LLVM/12.0.1/GCC/bin/clang" CXX="/apps/LLVM/12.0.1/GCC/bin/clang++" \
  cmake -G Ninja ../llvm-src/llvm \
  -DCMAKE_BUILD_TYPE=Release "-DLLVM_ENABLE_PROJECTS=clang" \
  -DLLVM_ENABLE_LLD=ON -DCMAKE_RANLIB=$(which llvm-ranlib)
  -DCMAKE_AR=$(which llvm-ar) -DLLVM_PARALLEL_LINK_JOBS=48 \
  -DCMAKE_C_COMPILER_LAUNCHER=distcc -DCMAKE_CXX_COMPILER_LAUNCHER=distcc

The important flags here are -DCMAKE_C_COMPILER_LAUNCHER=distcc and -DCMAKE_CXX_COMPILER_LAUNCHER=distcc. The build system will use these when building (but not during configuration, which is convenient). The other flags are just to ensure a release build and force the build system to use lld, not to run (locally) more than 48 link jobs, llvm-ranlib and llvm-ar from LLVM which are faster than the usual GNU counterparts.

The cmake invocation should complete successfully.

Before we continue, we must make sure the variables CPATH, C_INCLUDE_PATH and CPLUS_INCLUDE_PATH are not set, otherwise the pump mode will refuse to work.

$ unset CPATH C_INCLUDE_PATH CPLUS_INCLUDE_PATH

Now we can invoke ninja but telling it that it uses 192 (= 48 × 4) concurrent jobs. We have to use pump to enable distcc’s pump mode.

$ time pump ninja -j$((48 * 4))
__________Using distcc-pump from <install-distcc>/bin
__________Using 4 distcc servers in pump mode
[4382/4382] Linking CXX executable bin/c-index-test
__________Shutting down distcc-pump include server

real	3m0.985s
user	4m42.380s
sys	2m56.167s

3 minutes to compile clang+LLVM in Release mode is not bad 😀

Fun with vectors in the Raspberry Pi 1 - Part 9

2021-08-22T09:48:00+00:00

I think we have enough pieces of machinery working already that we can start with the most exciting part of this journey: autovectorisation!

SIMD

Data parallelism

If we look at the step of computations required by an algorithm, we may find that often the precise order between some of the steps is not relevant. When this happens we say that those steps could run concurrently and the algorithm would still be correct. We can call concurrency to the number of operations that can be executed concurrently. When concurrency is somewhat related (or directly proportional) to the amount of data being processed by the algorithm we can say that it exposes data parallelism.

The following C program tells us to add elements from two arrays from 0 to N-1 but nothing it it requires that. For instance, we could run from N-1 to 0 and the observable effect would be identical.

simple_add.c

enum { N = 1024 };

float a[N];
float b[N];
float c[N];

void vector_sum(void) {
  for (int i = 0; i < N; i++) {
    c[i] = a[i] + b[i];
  }
}

We can exploit data parallelism in many ways: we can distribute parts of the computation over different computers in a cluster, over different threads in a multicore or, the approach we are interested in, between the different elements of a vector of a CPU using SIMD instructions. The important assumption here is that all of the mentioned approaches can perform several computations simultaneously in time.

SIMD precisely represents this idea: Single Instruction Multiple Data. With a single CPU instruction we can process more than one element of data. We can obtain performane gains from using SIMD instructions if the CPU can execute them in a similar amount of time as their scalar counterparts. It all this depends on the amount of resources that the CPU has. The ARM1176JZF-S that powers the Raspberry Pi 1 does not devote extra resources, so vector instructions take proportionally longer, so we will not improve the performance a lot. However there are still some small gains here: each instruction executed comes with a (small) fixed cost which we are now avoiding.

Autovectorisation

Compilers may be able to identify in some circumstances that the source code is expressing data parallel computation. We will focus on loops though it is possible to identify similar cases for straight-line code.

Once those cases are identified, the compiler may be able to implement the scalar computation using vector instructions. This process is called vectorisation.

Historically automatic vectorisation has been a bit disappointing. Compilers must be very careful not to break the semantics of the program. Some programming languages, such as C and C++, require very complex analyses to determine the safety of the vectorisation. This process is also time consuming and it is not commonly enabled by default in mainstream compilers. So many interesting loops which potentially could be vectorised are not vectorised. Or worse, the programmer has to adapt the code so it ends being vectorised.

Enable vectorisation in LLVM

Vectorization is necessarily a target-specific transformation. LLVM IR is not platform neutral but its genericity helps reusing code between architectures. Sometimes the LLVM IR optimisation passes need information from the backend to assess if a transformation is profitable or not.

The loop vectoriser is not an exception to this, so before we can get it to vectorise simple codes, we need to teach the ARM backend about the new vector reality.

A first thing we need to do is to let LLVM know how many vector registers are there. We mentioned that in practice is like if there were 6 of them.

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

   unsigned getNumberOfRegisters(unsigned ClassID) const {
     bool Vector = (ClassID == 1);
     if (Vector) {
+      if (ST->hasVFP2Base())
+        return 6;
       if (ST->hasNEON())
         return 16;
       if (ST->hasMVEIntegerOps())
         return 8;
       return 0;
     }
 
     if (ST->isThumb1Only())
       return 8;
     return 13;
   }

A second thing we need to let LLVM know is the size of our vectors. Because we aim only for vectors that can hold either 4 floats or 2 doubles and both cases amount to 128 bit, we will claim that size.

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

   TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const {
     switch (K) {
     case TargetTransformInfo::RGK_Scalar:
       return TypeSize::getFixed(32);
     case TargetTransformInfo::RGK_FixedWidthVector:
+      if (ST->hasVFP2Base())
+        return TypeSize::getFixed(128);
       if (ST->hasNEON())
         return TypeSize::getFixed(128);
       if (ST->hasMVEIntegerOps())
         return TypeSize::getFixed(128);
       return TypeSize::getFixed(0);
     case TargetTransformInfo::RGK_ScalableVector:
       return TypeSize::getScalable(0);
     }
     llvm_unreachable("Unsupported register kind");
   }

With all this we can try our loop above.

$ clang --target=armv6-linux-gnueabihf -O2 -S -o- simple_add.c

vector_sum:
	.fnstart
@ %bb.0:
	.save	{r11, lr}
	push	{r11, lr}
	ldr	r12, .LCPI0_0
	ldr	lr, .LCPI0_1
	ldr	r3, .LCPI0_2
	mov	r0, #0
.LBB0_1:                                @ =>This Inner Loop Header: Depth=1
	add	r1, r12, r0
	add	r2, lr, r0
	vldr	s2, [r1]
	vldr	s0, [r2]
	add	r1, r3, r0
	add	r0, r0, #4
	cmp	r0, #4096
	vadd.f32	s0, s2, s0
	vstr	s0, [r1]
	bne	.LBB0_1
@ %bb.2:
	pop	{r11, pc}
	.p2align	2

Uhm, this is not what we wanted, right? The reason is that in general the vectoriser will try not to make unsafe transformations. VFP instructions are not 100% compliant with IEEE-754 so the vectoriser will not use them by default.

We need to tell the compiler “it is OK, let’s use not 100% precise instructions” by using -O2 -ffast-math or the shorter form -Ofast.

$ clang --target=armv6-linux-gnueabihf -Ofast -S -o- simple_add.c

vector_sum:
	.fnstart
@ %bb.0:
	.save	{r4, lr}
	push	{r4, lr}
	.vsave	{d8, d9, d10, d11}
	vpush	{d8, d9, d10, d11}
	ldr	r12, .LCPI0_0
	ldr	lr, .LCPI0_1
	ldr	r3, .LCPI0_2
	mov	r0, #0
.LBB0_1:                                @ =>This Inner Loop Header: Depth=1
	add	r1, r12, r0
	add	r2, lr, r0
	vldmia	r1, {s16, s17, s18, s19, s20, s21, s22, s23}
	vldmia	r2, {s8, s9, s10, s11, s12, s13, s14, s15}
	vmrs	r2, fpscr
	mov	r1, #196608
	bic	r2, r2, #458752
	orr	r2, r2, r1
	vmsr	fpscr, r2
	vadd.f32	s12, s20, s12
	vadd.f32	s8, s16, s8
	add	r1, r3, r0
	add	r0, r0, #32
	add	r2, r1, #20
	cmp	r0, #4096
	vstmia	r2, {s13, s14, s15}
	vstmia	r1, {s8, s9, s10, s11, s12}
	bne	.LBB0_1
@ %bb.2:
	vmrs	r1, fpscr
	bic	r1, r1, #458752
	vmsr	fpscr, r1
	vpop	{d8, d9, d10, d11}
	pop	{r4, pc}
	.p2align	2

This is more interesting!

Note however we have some gross inefficiency here: we are changing the vector length (setting it to 2) in every iteration of the loop. Later in this post we will evaluate if it is worth trying to hoist it out of the loop.

A very simple benchmark

Let’s use this simple benchmark that computes a vector addition of floats (similar to the code shown above in the post). It also has a mechanism to validate the scalar version and the vector version. #pragma clang loop is used to explicitly disable vectorisation in the scalar loops that would otherwise be vectorised.

The benchmark can be adjusted for number of times we run the benchmark and the size of the vector. This is useful to run it in different scenarios.

This benchmark has a low ratio of computation over memory accesses. We do one addition and three memory accesses (two loads and one store). This means that the arithmetic intensity of this benchmark is small. We may not be able to observe a lot of improvement with vector instructions.

We can study more favourable situations if we use smaller arrays. In this case, when we run the benchmark again, chances are that the vector will be in the cache already. While the arithmetic intensity hasn’t changed, in this situation the arithmetic computation has higher weight in the overall execution.

Let’s look at two executions of the benchmark. The figure below show the ratio of execution time of the vector loop respect to the scalar loop (not vectorised at all). The plot at the left shows the results when the vectorised loop sets the vector length at each iteration, as it is emitted by the compiler. The plot at the right shows the results when the vector length change is hoisted out of the loop: this is, it is set only once before entering the vector body. I did this transformation by editing the assembly output.

Very simple array addition benchmark. The plot at the left contains the results for the program as it is emitted by the compiler with vectorisation enabled. It sets the vector length at each iteration of the vectorised loop. The plot at the right contains the result for a manually modified assembly output so it sets the vector length right before entering the vectorised loop. The benchmark runs 256 times and it was run 50 times for each array length (from 4 to 131072).

In both cases an array of 4 floats does not perform very well because the loop has been vectorised using an interleave factor of 2. So each vector loop iteration wants to process 8 iterations of the original scalar loop. There is some overhead setting up the vector length to 1 in the tail loop (the loop that processes the remaining elements that do not complete a vector) hence the bad performance. This is less noticeable on the right plot as the vector length is only set once before entering the scalar loop (not once per iteration of the loop as it happens on the left).

On the left plot we see that, until 4096 float elements, the improvement over scalar is modest: around 2.5X the scalar code. I believe changing the vector lenght (which requires reading the fpscr) introduces some extra cost that limits the performance. On the right plot we see it improves up to ~3.4X. This means it is a good idea to hoist the set vector length out of the vector loop if possible. We will look into it in a later chapter.

Starting from 4096, both plots show a similar performance degradation. The reason is that our working set is now beyond the 16KB of the L1 data cache. Note that when using arrays of 2048 float elements each array takes 8KB. Given that the benchmark uses two of them in the loop, our working set is 16KB. Beyond that we overflow the cache (the L2, afaict is not used) and the performance drops to ~1.3X. The low arithmetic intensity of this benchmark means it quickly becomes a memory-bound problem rather than a CPU-bound problem. Vectorisation is unlikely to help under memory-bound problems.

Too good to be true?

After thinking a bit more about the results, a doubt came to mind. In practice, this is like if we had unrolled the loop but using hardware instructions. This is because the vector mode of the VFP does not bring any performance benefit: the latency of the instructions is scaled by the vector length.

So I extended the benchmark to include an unrolled version of the vector addition, without vectorisation. I think the results speak for themselves.

Unrolling is almost as competitive in performance as vectorising in the naive way. We can get a bit of an edge if we hoist the set vector length, but this advantage quickly fades away.

So as I already hypothesised at the first chapter of this series, the only benefit we may obtain from using vector instructions is code size improvement.

In the next chapter we will look into trying to hoist the set vector length out of the loops that only set it once (which we expect to be the common case for vectorised loops).

Think In Geek

Subtleties with loops

A ranged-loop over integers

A possible implementation

Iterating a whole range of integers

A safer, but less nice, implementation

Impact on optimisation

What about C and C++?

Mitigate runaway processes

Systemd

systemd-run

Use case

Running inside systemd-run

Locally testing API Gateway Docker based Lambdas

Lambda code and Docker image

Running and testing the Lambda function

Kong API Gateway to the rescue

Putting it all together

Graphical notifications for long-running tasks

Local notifications

How does this work?

Remote notifications

Forward the UNIX socket

Use a remote notification daemon

Activation via D-Bus

Writing GObjects in C++

Quick recap

Current approach

Custom button with extra properties

Main window

Main application

Discussion

Manual approach

Custom class helper

Header

DerivedButton_Class implementation

DerivedButton implementation

Registering the type

Discussion

Conclusion

Wrapping GObjects in C++

GObject

Wrapping on top of glibmm

Classes and objects

Signals

Properties

The challenge of subclassing

Subclassing is magic

Virtual methods

Properties

Signals

Why would we want to use C++ to write a GObject?

Bisecting flaky tests with rspec and GitHub Actions

Order dependant test failures

Bisect to the rescue

Our example

Automating bisects

Conclusions

OpenSSH as a SOCKS server

SOCKS

Example: Installing Rust through a proxy

Set up proxy server

Proxy configuration

Install Rust

Example: Using pip using SOCKS

Cleanup

Distributed compilation in a cluster

Distributed compilation

Scientific clusters

Example: compiling LLVM on 4 nodes

Install distcc

Set up distcc

LLVM’s cmake

Fun with vectors in the Raspberry Pi 1 - Part 9

SIMD

Data parallelism

Autovectorisation

Enable vectorisation in LLVM

A very simple benchmark

Too good to be true?

`DerivedButton_Class` implementation

`DerivedButton` implementation