Today we will toy around a very simple optimization in clang and discuss a bit about separation of concerns when optimizing code.
clang is a C/C++ front end mostly used in the LLVM project. It is designed as a library and offers many services in that new trend of compiler as a service that has been popular in the last years: compilers have shifted from being monolithic pieces of software with narrow purposes to libraries offering language and compilation services with broader applicability. clang itself is not an exception. There is a whole ecosystem of tools built around it.
One of the most common use cases of clang is as a C++ compiler. As a front end it closely follows the C++ standard and it is often used as testbed for new proposals. clang itself does not generate code that can be executed. Instead, clang generates LLVM IR which is then handed to the LLVM framework for the purpose of generating code. The LLVM infrastructure is able to optimize the LLVM IR so in principle a front end does not have to be concerned with optimization, or does it?
C++11 introduced many new features, one of them was
constexpr. In C++03 and C99, the
const keyword conflates two meanings: something that is constant in compile time but also something that may be stored in constant memory. C++03 requires in some contexts constant expressions, this is, expressions that can be evaluated by the compiler (before even running the program being compiled). To do this it allowed the use of some variables (basically integral types) initialized with constant expressions and declared as
const in constant expressions.
This fell short for some cases where we may want the compiler to compute something more elaborated. C++11 solves this problem by introducing the keyword
constexpr which basically means that the entity can be used in a constant expression. This was extended to functions and member functions as well. If the context requires a constant expression, the compiler is mandated to evaluate the constant expression and this may imply calling
C++11 was prudent in the specification of constexpr functions and they are really constrained. They can only contain a single return whose expression must be possible to evaluate at compile time. They can contain other declarations like typedefs, though. The expression in the return can include calls to other constexpr functions including the current one, recursively. These restrictions made the implementation of the constexpr feature relatively simple.
C++14 relaxed the constraints imposed in constexpr functions and it allows constexpr functions to be more general, including other statements than just a return statement. This makes the implementation a bit more complicated because it means that an interpreter must be added to our front end which executes the body of a
Non constant expression context
It is obvious that in C++11/C++14 inside a constant expression context, function calls to constexpr will have to be evaluated. What about other non constant expressions contexts where we call a constexpr function with all arguments as constant expressions? Will the compiler care?
Let's check two popular C++ compilers. We see that GCC generates the same code for both cases but clang decides to call
fib1 even if we know already that the result is 89.
This is a bit surprising and we may wonder whether we may want clang to generate a similar code. Before doing anything, we need to understand what is going on. GCC front end seems to be knowing that the two functions return 89. We can see this by using
-fdump-tree-original which will dump the representation that is going to be passed to the rest of the compiler.
We can do a similar thing in clang, by showing us the generated LLVM IR.
clang simply has not evaluated the constexpr function call in either case.
Important! There is nothing wrong in what clang is doing. It does not have to evaluate the constexpr function call at all as it does not appear in the context of a constant expression.
When we compared the two compilers, why the clang generated code still calls
fib1 while it has been able to simplify the call to
fib2 as 89? The reason is that LLVM has not optimized the recursive function
fib1, it might be too complicated for the existing optimizers. On the other hand, the non-recursive function
fib2 is likely more amenable to optimization and the compiler infers that the whole call will simply be 89.
Where to do the fix?
We can either attempt to improve LLVM or clang.
- Pro: A larger set of users (not only C++ front ends like clang) may benefit from it.
- Con: It may be harder to implement.
- Con: It may be easy to come up with a C++ case that will fail to optimize.
- Con: Makes the C++ front end slower as we're evaluating function calls that we did not evaluate before.
- Pro: No matter how crazy is the
constexprfunction, as long as it is valid we will evaluate it.
- Con: Only C++ benefits from it.
So as a proof of concept I decided to change clang. I basically hacked some changes in
CGCall.cpp. These two files handle the code generation (i.e. LLVM IR actually) for expressions and function calls respectively. If we are calling a
constexpr function, I added an attempt to evaluate the expression as a constant. If the evaluation succeeds I thread a
llvm::Constant* so when the final call is about to be emitted, the constant value is used instead. I gated this to optimization levels higher than
Don't take this as a serious change for clang. It is just a proof of concept and while
check-clang did not report regressions because of this, it must be tested more thoroughly.
Note that we are only handling regular function calls of the form
f(x) and not calls to member functions (that might be constexpr as well) like
a.b(x). I will discuss below why the latter case is more complex and does not fit really well here.
We simply use the constant when the call would be emitted instead.
After these changes (and a few others to get everything compiling again) we get what we expected:
Member function calls
Constructors and a special members like constructors can be
constexpr. But these cases can't really be optimized as easily, the reason is that the implicit parameter (the object pointed by
this) should be evaluatable at constant time. When this happens, though, clang already does a good job for them.
Does this make sense?
While this change looks like a low hanging fruit I'm not sure it makes much sense. clang was never designed to be a front end that optimizes its output. Doing that would have some value as it would free LLVM of some optimization work. But if the same optimization can be done in LLVM the benefit is much higher as all clients (including clang) will take advantage of it. Also, C++14 gives much more freedom when writing
constexpr functions, without requiring recursion. This means that the LLVM IR generated by clang will be much simpler and then LLVM may optimize it much easier. All in all, this change might be a marginal improvement for users that cannot use C++14.