I believe this is not a very common scenario, but sometimes one has to develop libraries whose scope is the whole process. In such a situation, we may need to identify if a process has already loaded another copy of the library.

Libraries

Consider a library L offering some service. Often we do not care much if a library has been loaded more than once in the memory of a process. The reason is that nowadays most library implementations do not rely on global data. Or if they do, it is global data that is tied to some context object.

There is a number of situations where we may, intentionally or not, load a library more than once. It often involves a mixture of static linking and dynamic linking. For instance, our executable E may statically or dynamically link L (copy 1) and then dynamically link (or load by means of dlopen) another library L2 that has been statically linked with L (copy 2).

As I mentioned above, most of the time, this may not be a problem, because most libraries model their services around some context that is used for resource management. In our executable above, E will use contexts from from copy 1 of L while L2 will use contexts of copy 2 of L.

However a number of libraries may provide services whose scope is the whole process and it may not be desirable (or needs special handling) to have the library loaded twice. Examples could include, for instance, a process monitoring library or some multi-threaded runtime.

Of course the obvious answer is don’t do this, but sometimes it may be difficult to avoid doing this and we may like a mechanism to diagnose this.

Requirements

Ideally we want a mechanism that is able to tell if our library has already been loaded in the memory of the process.

However, want to avoid using files or other global objects (such as IPCs) because they come with their own set of problems :we have to create them, remove them when done, and we risk leaving stuff behind if we end abnormally, etc.

In the context of Linux we can use two approaches. A first one using environment variables and a second one (ab)using the Linux’s own key management infrastructure. Once the process ends, these resources go away without trace, no matter if the process ended abnormally or not.

Setting

Let’s prepare an example that showcases a library that should be counting something, process-wise. The simplest solution so far is detecting the error at runtime. A more sophisticated solution (for instance in which the second copy just forwards everything to the first copy) will not be explored in this post. I will also assume the two copies are the same (or identical when it comes to the part in which they try to detect other copies).

only_one

Our library only_one will offer a very iminal interface, in which we get an increasing number every time we invoke only_get.

lib/only_one.h
1
2
3
4
5
6
#ifndef ONLY_H
#define ONLY_H

int only_get(void);

#endif // ONLY_H
lib/only_one.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include "only_one.h"
static char initialized;
// Global state that we do not want to
// accidentally replicate in a process.
static int current_id;

static void initialize(void) {
  if (!initialized) {
    // Not needed. For the sake of the example.
    current_id = -1;

    initialized = 1;
  }
}

int only_get(void) {
  initialize();
  return ++current_id;
}

I will use meson because it is rather concise when expressing build rules. For simplicity we will generate a static library.

meson.build
1
2
3
4
5
project('only-library', ['c'], version: '1.0.0')

lib_only_one = static_library('only_one',
                 ['lib/only_one.h',
                  'lib/only_one.c'])

only_two

Now let’s model the issue of having the library used by another one, this time a shared library. This shared library is called only_two.

only_two has a very minimal interface as well.

lib/only_two.h
1
2
3
4
5
6
#ifndef ONLY_TWO_H
#define ONLY_TWO_H

int only_two_get(void);

#endif // ONLY_TWO_H

And its implementation only forwards to only_one. This models the idea that library only_two uses only_one. A more realistic library would bring more value than just forwarding the call.

lib/only_two.h
1
2
3
4
5
6
#include "only_two.h"
#include "only_one.h"

int only_two_get(void) {
    return only_get();
}

When writing dynamic libraries one has to be very careful about symbol visibility. In particular the defaults of ELF, used in Linux, are often too lax. We can restrict that using a version script. In this example we only make one symbol visible for version 1.0. Everything else will be not exported.

lib/only_two.map
1
2
3
4
5
6
7
LIBONLY_TWO_1.0 {
  global:
    only_two_get;

  local:
    *;
};

Below are the rules for only_two. Version scripts are not fully integrated in meson yet so we need to manually build the proper linker flag option.

meson.build
6
7
8
9
10
11
12
13
only_two_version_script_flag = \
  '-Wl,--version-script,@0@/lib/@1@'.format(meson.current_source_dir(), 'only_two.map')
lib_only_two = shared_library('only_two',
                        ['lib/only_two.h',
                         'lib/only_two.c'],
                 link_with : [lib_only_one],
                 link_args: [only_two_version_script_flag],
                 version : '1.0.0')

May not be obvious at this point, but we’re embedding only_one inside only_two (technically only the functions that only_two uses from only_one but if those functions use global data that will be embedded too).

Driver

Ok let’s write a small example using the two libraries.

tools/use_only.c
1
2
3
4
5
6
7
8
9
10
11
12
13
#include "only_one.h"
#include "only_two.h"
#include <stdio.h>

int main(int argc, char *argv[]) {
  printf("get 0 -> %d\n", only_get());
  printf("get 1 -> %d\n", only_get());
  printf("get 2 -> %d\n", only_get());
  printf("[TWO] get 3 -> %d\n", only_two_get());
  printf("[TWO] get 4 -> %d\n", only_two_get());
  printf("[TWO] get 5 -> %d\n", only_two_get());
  return 0;
}

And its meson build rules

meson.build
15
16
17
18
executable('use_only',
  ['tools/use_only.c'],
  include_directories : ['lib'],
  link_with : [lib_only_one, lib_only_two])

If we execute this program we will obtain this

$ ./use_only
get 0 -> 0
get 1 -> 1
get 2 -> 2
[TWO] get 3 -> 0
[TWO] get 4 -> 1
[TWO] get 5 -> 2

This is wrong if we intend to use only_get to be a global counter for the process. As I mentioned above, the goal today is not to fix this but instead error at runtime.

Environment variables

Our first approach to detect that only_one has already been loaded will be based on using environment variables.

The main idea is to set an environment variable when the library initializes. If the environment variable was already there we know the library was around.

lib/only_one.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include "only_one.h"

// Very simple error reporting mechanism.
static void die(const char *msg) {
  fprintf(stderr, "%s", msg);
  raise(SIGABRT);
}

// A key reserved for this library.
static const char ONLY_INSTANCE[] = "__ONLY_INSTANCE";

static void set_sigle_instance(void) {
  setenv(ONLY_INSTANCE, "1", /* overwrite */ 1);
}
static void check_single_instance(void) {
  const char *only_instance = getenv(ONLY_INSTANCE);
  if (only_instance == NULL) {
    env_set_single_instance();
  } else {
    die("another copy of the library loaded!\n");
  }
}

static char initialized;
// Global state that we do not want to
// accidentally replicate in a process.
static int current_id;

static void initialize(void) {
  if (!initialized) {
    check_single_instance();

    current_id = -1;
    initialized = 1;
  }
}

Now we can detect this:

$ ./use_only
get 0 -> 0
get 1 -> 1
get 2 -> 2
another copy of the library loaded!
Aborted (core dumped)

This approach works but has a minor problem: setenv is not thread-safe. If our program uses more than one thread then we risk that some other library in some other thread uses getenv and crashes due to the concurrent access (I assume it is not realistic to protect all the uses of getenv with a mutex/lock).

Linux key management

Linux has a key management mechanism that is pretty cool. Basically one can register keys. A kind of key is called "user" key and allows a small payload to be stored. Keys are kept in keyrings. Linux provides a number of predefined keyrings with different scopes. One of them is a process-wide keychain called the process keyring.

The downside is that the system calls used for key management are not wrapped by the GNU C library. Instead a library called libkeyutils must be used.

lib/only_one.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include "only_one.h"
#include <errno.h>
#include <keyutils.h>
#include <sys/types.h>

// Very simple error reporting mechanism.
static void die(const char *msg) {
  fprintf(stderr, "%s", msg);
  raise(SIGABRT);
}

// A key reserved for this library.
static const char ONLY_INSTANCE[] = "__ONLY_INSTANCE";

static void set_single_instance(void) {
  const char payload[] = "1";
  key_serial_t k = add_key("user", ONLY_INSTANCE, &payload, sizeof(payload),
                           KEY_SPEC_PROCESS_KEYRING);
  if (k == -1) {
    die("add_key failed\n");
  }
}

static void check_single_instance(void) {
  key_serial_t k =
      request_key("user", ONLY_INSTANCE, NULL, KEY_SPEC_PROCESS_KEYRING);

  if (k == -1) {
    if (errno == ENOKEY) {
      key_set_single_instance();
    } else {
      die("request_key failed\n");
    }
  } else {
    die("another copy of the library loaded!\n");
  }
}

Now the concern about multi-thread gets a bit different. Because these library calls are backed by the kernel, they should do the right thing already. So in principle, if we care about multi-threading it should suffice to protect the check_single_instance with a mutex, so only one thread can see that the key has not been added. However, getting that mutex process-wise is not obvious so we haven’t quite solved the problem here.

We still need to update our meson build rules. Library libkeyutils provides a pkg-config file, so it is not difficult for meson to find it.

meson.build
1
2
3
4
5
6
7
8
project('only-library', ['c'], version: '1.0.0')

libkeyutils_deps = dependency('libkeyutils')

lib_only_one = static_library('only_one',
                 ['lib/only_one.h',
                  'lib/only_one.c'],
                 dependencies : [libkeyutils_deps])

Where to go from here

It is possible to implement more sophisticated mechanisms on top of these techniques.

For instance, some libraries may be loaded more than once but only one of them may be in a running state. We can keep in the payload of the key (or environment variable) the address of a variable we can use to know if the library is running.

Another option, is to design our library around a single “root” (or top level context) variable. Similarly, we can keep the address of the root in the payload of the key or the environment variable. The second copy can avoid allocating the resources for the top level context and just use the previous one.

A final but important note: these techniques as presented are not appropriate in security sensitive contexts. Any adversarial library can easily fake the registration process for nefarious purposes.