I’ve been implementing an algorithm that works on a graph. That algorithm needs to create and destroy lots of nodes and edges to keep track of the algorithm state correctly. The algorithm also needs to be fast in order to be competitive against a similar algorithm that uses sets of stacks instead of graphs. Profiles show that memory allocations are impacting negatively the performance, so maybe a memory pool can help.

Don't reject malloc/free or new/delete lightly

In C, malloc and free and, in C++, new and delete provide the foundations of dynamic memory. Quite often our applications need dynamic memory because the amount of memory that they will require is not known at compile time. Instead of creating big arrays and limiting the sizes of our inputs, which may not be acceptable, dynamic memory allows us to allocate the memory we need, at runtime.

These facilities give good performance in general because most applications do a lot of work between the allocations and deallocations. They are general enough to perform well in a broad spectrum of applications. There is usually little incentive not to use them.

That said there are applications where the ratio of allocation/deallocation over “real work” is very small. I found myself in that situation, I have an algorithm that uses a graph to represent its state. It creates and destroys many nodes and edges during the computation. Sadly the amount of work between updates in the graph is really small. This means that allocations and deallocations are just observed in the profile as overhead. In these circumstances we are forced to use less general strategies for our dynamic memory. I decided to implement a very simple pool allocator.

Pool allocator

In an algorithm like the one I’m trying to make competitive, there is a big number of allocations/deallocations all of them of the same kind of object. This implies that all these objects will take the same amount of storage in memory. This means that if we try to be less general than plain malloc we can do faster allocations.

A pool allocator is a typical approach to this. A chunk of memory, that I will call arena, is allocated by the program. This chunk has enough storage for up to a given number of elements. So we can allocate up to that number of elements using the arena. As the arena itself is nothing but an array, it is very easy to find storage for the allocations. A pool that only uses an arena means that it has a limit of elements it can allocate. I am interested in a pool allocator that can use more than one arena. This means that when all arenas are full, the pool allocator should allocate another arena (of course using malloc).

Perhaps the trickiest part of memory allocation is keeping track of what storage is available for later allocations. We can use two techniques: bitmaps and free lists. A bitmap will assign each bit of an integer (32 or 64 bit) to each position of the arena. When the i-th bit of the bitmap is set to 1, the i-th element in the arena is allocated and can be freed. If set to 0, that element in the arena is free and can be allocated. By using specific instructions like “count leading zeroes” and similar, it is possible to efficiently use bitmaps to tell where is the next free element. There are some downsides: they take extra storage and they are limited to the maximum integer we can use (for efficiency reasons). This means that in practice our arenas will be 32 or 64 elements long. Bitmaps also force us to be a bit clever when allocating memory: the first arena we check should always have available storage (otherwise it should mean we need a new arena) to avoid having to check several bitmaps when finding free storage.

Free lists are a different technique. A single link is threaded through the free elements. This forms a list of free elements that we can use to get the next free item. A free list is easy to maintain during allocation and deallocation. When the free list is empty this means we simply allocate a new arena and initialize the free list. For arenas with lots of elements this means that allocating a new arena may take longer. Also as we can free data from different arenas, the free list may end threading different arenas. This can harm the spatial locality of successive allocations after a bunch of deallocations of objects in different arenas.

A very simple implementation in C++11

As this is C++, the pool will be parameterized for each type.

template <typename T> struct minipool {

Items

An arena is formed by several items. Each item is a union of properly sized and aligned storage to hold T and a pointer to the next free item (it can be null).

  union minipool_item {
  private:
    using StorageType = alignas(alignof(T)) char[sizeof(T)];

    // Points to the next freely available item.
    minipool_item *next;
    // Storage of the item. Note that this is a union
    // so it is shared with the pointer "next" above.
    StorageType datum;

This item will have a few methods that we will use later. We have to be able to get and set the next item and also get the storage. We also need a function that given a pointer to T returns a pointer to the item that contains it. We will use this when freeing a T. As this is a union a simple cast will do.

  public:
    // Methods for the list of free items.
    minipool_item *get_next_item() const { return next; }
    void set_next_item(minipool_item *n) { next = n; }

    // Methods for the storage of the item.
    T *get_storage() { return reinterpret_cast<T *>(datum); }

    // Given a T* cast it to a minipool_item*
    static minipool_item *storage_to_item(T *t) {
      minipool_item *current_item = reinterpret_cast<minipool_item *>(t);
      return current_item;
    }
  }; // minipool_item

Arenas

Our pool will be made of one or more arenas. An arena is just an array of items and a pointer to the next arena.

  // Arena of items. This is just an array of items and a pointer
  // to another arena. All arenas are singly linked between them.
  struct minipool_arena {
  private:
    // Storage of this arena.
    std::unique_ptr<minipool_item[]> storage;
    // Pointer to the next arena.
    std::unique_ptr<minipool_arena> next;

When creating an arena, all the elements inside are free and so they have to be in the free list. We need to chain each item with the consecutive one, except the last one. The arena itself is created using the usual operator new from C++.

    // Creates an arena with arena_size items.
    minipool_arena(size_t arena_size) : storage(new minipool_item[arena_size]) {
      for (size_t i = 1; i < arena_size; i++) {
        storage[i - 1].set_next_item(&storage[i]);
      }
      storage[arena_size - 1].set_next_item(nullptr);
    }

When creating an arena we need to get the storage so we can correctly update the free list.

    // Returns a pointer to the array of items. This is used by the arena
    // itself. This is only used to update free_list during initialization
    // or when creating a new arena when the current one is full.
    minipool_item *get_storage() const { return storage.get(); }

Finally, when creating a new arena because the current one is full, we need to link the new one to the current one, so the arenas are linked.

    // Sets the next arena. Used when the current arena is full and
    // we have created this one to get more storage.
    void set_next_arena(std::unique_ptr<minipool_arena> &&n) {
      assert(!next);

      next.reset(n.release());
    }
  }; // minipool_arena

Data members of the pool

Our pool basically needs to know the current arena and the free list. As this pool allocates arenas all of the same size, we need to store the size of the arenas.

  // Size of the arenas created by the pool.
  size_t arena_size;
  // Current arena. Changes when it becomes full and we want to allocate one
  // more object.
  std::unique_ptr<minipool_arena> arena;
  // List of free elements. The list can be threaded between different arenas
  // depending on the deallocation pattern.
  minipool_item *free_list;

Creation of the pool

When creating a pool we need a first arena. So this is what the constructor does.

  // Creates a new pool that will use arenas of arena_size.
  minipool(size_t arena_size)
      : arena_size(arena_size), arena(new minipool_arena(arena_size)),
        free_list(arena->get_storage()) {}

Allocation

To allocate a new T* we will simply forward the arguments to the constructor of T. But first we need to find storage for the new T.

If our free list is empty (this is, the current arena is full) we will need to create a new arena. We will link the new arena to the current one and then point to it. The new arena brings new free items, so we need to update the free list with the new storage of the arena.

Now that we have storage, it is just a matter of using the free list (which now we know is not empty). We’re going to use the first item of the free list, so we just retrieve the current item and then we update the free list. Now it is only missing to get the storage of the item and initialize the object using a placement new. And return the new object, of course.

  // Allocates an object in the current arena.
  template <typename... Args> T *alloc(Args &&... args) {
    if (free_list == nullptr) {
      // If the current arena is full, create a new one.
      std::unique_ptr<minipool_arena> new_arena(new minipool_arena(arena_size));
      // Link the new arena to the current one.
      new_arena->set_next_arena(std::move(arena));
      // Make the new arena the current one.
      arena.reset(new_arena.release());
      // Update the free_list with the storage of the just created arena.
      free_list = arena->get_storage();
    }

    // Get the first free item.
    minipool_item *current_item = free_list;
    // Update the free list to the next free item.
    free_list = current_item->get_next_item();

    // Get the storage for T.
    T *result = current_item->get_storage();
    // Construct the object in the obtained storage.
    new (result) T(std::forward<Args>(args)...);

    return result;
  }

Deallocation

When deallocating an object, we need to put it back in the free list after we have destroyed the object. For simplicity we are going to put it at the beginning of the free list, so it is just a matter of retrieving the item related to the freed pointer and then link it to the free list (that could be empty at this point but it does not matter). Finally, update the free list to start at the freed item.

  void free(T *t) {
    // Destroy the object.
    t->T::~T();

    // Convert this pointer to T to its enclosing pointer of an item of the
    // arena.
    minipool_item *current_item = minipool_item::storage_to_item(t);

    // Add the item at the beginning of the free list.
    current_item->set_next_item(free_list);
    free_list = current_item;
  }
}; // minipool<T>

And that’s it. Really simple.

Example

#include "minipool.h"
#include <iostream>

#define TRACE_METHOD() std::cout << this << " " << __PRETTY_FUNCTION__ << "\n";

struct Foo {
  int x = 42;
  Foo() { TRACE_METHOD(); }
  Foo(int x) : x(x) { TRACE_METHOD(); }
  ~Foo() { TRACE_METHOD(); };
};

int main(int argc, char *argv[]) {
  minipool<Foo> mp(256);

  Foo *p1 = mp.alloc();
  Foo *p2 = mp.alloc(44);

  std::cout << "p1->x=" << p1->x << "\n";
  std::cout << "p2->x=" << p2->x << "\n";

  mp.free(p1);
  mp.free(p2);

  return 0;
}

Did it help?

In my case yes, it does help. Using malloc/free instead of a pool makes the program two times slower.

But your mileage may vary so make sure you benchmark your program before and after. Few things are sadder than optimizing something that won’t have any impact.

Where to go from here?

OK. This is very simple and it fulfills my original needs. But it can be improved of course. Some ideas follow:

  • It is easy to do mistakes using this allocator. In particular double-frees. So we may want to extend it with consistency checks. For instance, adding a Policy template parameter defaulted to no operations (for efficiency) so it can be set to a hypothetical DebugMemory policy that checks the operations.
  • What about memory leaks? Adding a memory leak checker (also policy controlled for efficiency)
  • What about bulk destructions? If we destroy the memory pool the memory is released but the destructors of the remaining objects are not invoked! Options include making this an error (as if this were a memory leak) or bulk destroy all the objects. The last option is very convenient for computations that need to allocate temporary data and want to release it at once. This is not trivial because we have a free list but not a "used list"!
  • What about exceptions?
  • What about alloc and free using a minipool<T>::ptr (or similar) instead of a raw T*. This would allow us to safely call delete with these pointers but comes at cost that each pointer now needs to know its pool.

Happy hacking.