Whose is this optimization?

Today we will toy around a very simple optimization in clang and discuss a bit about separation of concerns when optimizing code.

clang

clang is a C/C++ front end mostly used in the LLVM project. It is designed as a library and offers many services in that new trend of compiler as a service that has been popular in the last years: compilers have shifted from being monolithic pieces of software with narrow purposes to libraries offering language and compilation services with broader applicability. clang itself is not an exception. There is a whole ecosystem of tools built around it.

One of the most common use cases of clang is as a C++ compiler. As a front end it closely follows the C++ standard and it is often used as testbed for new proposals. clang itself does not generate code that can be executed. Instead, clang generates LLVM IR which is then handed to the LLVM framework for the purpose of generating code. The LLVM infrastructure is able to optimize the LLVM IR so in principle a front end does not have to be concerned with optimization, or does it?

constexpr

C++11 introduced many new features, one of them was constexpr. In C++03 and C99, the const keyword conflates two meanings: something that is constant in compile time but also something that may be stored in constant memory. C++03 requires in some contexts constant expressions, this is, expressions that can be evaluated by the compiler (before even running the program being compiled). To do this it allowed the use of some variables (basically integral types) initialized with constant expressions and declared as const in constant expressions.

This fell short for some cases where we may want the compiler to compute something more elaborated. C++11 solves this problem by introducing the keyword constexpr which basically means that the entity can be used in a constant expression. This was extended to functions and member functions as well. If the context requires a constant expression, the compiler is mandated to evaluate the constant expression and this may imply calling constexpr functions.

C++11 was prudent in the specification of constexpr functions and they are really constrained. They can only contain a single return whose expression must be possible to evaluate at compile time. They can contain other declarations like typedefs, though. The expression in the return can include calls to other constexpr functions including the current one, recursively. These restrictions made the implementation of the constexpr feature relatively simple.

// C++11: constexpr Fibonacci
constexpr int fib1(int n)
{
  return n <= 1 ? 1 : fib1(n-1) + fib1(n-2);
}

C++14 relaxed the constraints imposed in constexpr functions and it allows constexpr functions to be more general, including other statements than just a return statement. This makes the implementation a bit more complicated because it means that an interpreter must be added to our front end which executes the body of a constexpr function.

// C++14: constexpr Fibonacci
constexpr int fib2(int n)
{
  int n_1 = 1, n_2 = 1;
  int ret = 1;
  for (int i = 2; i <= n; i++)
  {
    ret = n_1 + n_2;
    n_2 = n_1;
    n_1 = ret;
  }
  return ret;
}

// C++14: The two functions evaluate the same (at least for n==10 :)
static_assert(fib1(10)  == fib2(10));

Non constant expression context

It is obvious that in C++11/C++14 inside a constant expression context, function calls to constexpr will have to be evaluated. What about other non constant expressions contexts where we call a constexpr function with all arguments as constant expressions? Will the compiler care?

int test1()
{
  return fib1(10);
}

int test2()
{
  return fib2(10);
}

Let's check two popular C++ compilers. We see that GCC generates the same code for both cases but clang decides to call fib1 even if we know already that the result is 89.

This is a bit surprising and we may wonder whether we may want clang to generate a similar code. Before doing anything, we need to understand what is going on. GCC front end seems to be knowing that the two functions return 89. We can see this by using -fdump-tree-original which will dump the representation that is going to be passed to the rest of the compiler.

$ g++ -c -std=c++14 -O fib.cc -fdump-tree-original
$ cat fib.cc.003t.original  # some lines omitted for brevity
;; Function int test1() (null)
;; enabled by -tree-original


<<cleanup_point return <retval> = 89>>;


;; Function int test2() (null)
;; enabled by -tree-original


<<cleanup_point return <retval> = 89>>;

We can do a similar thing in clang, by showing us the generated LLVM IR.

$ clang -std=c++14 -O -S -o- -emit-llvm -Xclang -disable-llvm-passes fib.cc # some lines omitted for brevity
; Function Attrs: uwtable
; Function Attrs: uwtable
define i32 @_Z5test2v() #0 {
entry:
  %call = call i32 @_Z4fib2i(i32 10)
  ret i32 %call
}
; Function Attrs: uwtable
define i32 @_Z5test1v() #0 {
entry:
  %call = call i32 @_Z4fib1i(i32 10)
  ret i32 %call
}

clang simply has not evaluated the constexpr function call in either case.

Important! There is nothing wrong in what clang is doing. It does not have to evaluate the constexpr function call at all as it does not appear in the context of a constant expression.

When we compared the two compilers, why the clang generated code still calls fib1 while it has been able to simplify the call to fib2 as 89? The reason is that LLVM has not optimized the recursive function fib1, it might be too complicated for the existing optimizers. On the other hand, the non-recursive function fib2 is likely more amenable to optimization and the compiler infers that the whole call will simply be 89.

Where to do the fix?

We can either attempt to improve LLVM or clang.

Improving LLVM:

Pro: A larger set of users (not only C++ front ends like clang) may benefit from it.
Con: It may be harder to implement.
Con: It may be easy to come up with a C++ case that will fail to optimize.

Improving clang:

Con: Makes the C++ front end slower as we're evaluating function calls that we did not evaluate before.
Pro: No matter how crazy is the constexpr function, as long as it is valid we will evaluate it.
Con: Only C++ benefits from it.

Changing clang

So as a proof of concept I decided to change clang. I basically hacked some changes in CGExpr.cpp and CGCall.cpp. These two files handle the code generation (i.e. LLVM IR actually) for expressions and function calls respectively. If we are calling a constexpr function, I added an attempt to evaluate the expression as a constant. If the evaluation succeeds I thread a llvm::Constant* so when the final call is about to be emitted, the constant value is used instead. I gated this to optimization levels higher than -O0.

Don't take this as a serious change for clang. It is just a proof of concept and while check-clang did not report regressions because of this, it must be tested more thoroughly.

diff --git a/lib/CodeGen/CGExpr.cpp b/lib/CodeGen/CGExpr.cpp
index 1a95eb1045..acd07eb341 100644
--- a/lib/CodeGen/CGExpr.cpp
+++ b/lib/CodeGen/CGExpr.cpp
@@ -3901,8 +3901,19 @@ RValue CodeGenFunction::EmitCallExpr(const CallExpr *E,
   if (E->getCallee()->getType()->isBlockPointerType())
     return EmitBlockCallExpr(E, ReturnValue);
 
+  llvm::Constant *ConstValue = nullptr;
+  if (CGM.getCodeGenOpts().OptimizationLevel > 0)
+    if (auto *DRE =
+            dyn_cast<DeclRefExpr>(E->getCallee()->IgnoreParenImpCasts())) {
+      if (auto FD = dyn_cast<FunctionDecl>(DRE->getDecl())) {
+        Expr::EvalResult Result;
+        if (FD->isConstexpr() && E->EvaluateAsRValue(Result, getContext()))
+          ConstValue = CGM.EmitConstantValue(Result.Val, E->getType(), this);
+      }
+    }
+
   if (const auto *CE = dyn_cast<CXXMemberCallExpr>(E))
    return EmitCXXMemberCallExpr(CE, ReturnValue);

@@ -3923,14 +3934,14 @@ RValue CodeGenFunction::EmitCallExpr(const CallExpr *E,
     return EmitCXXPseudoDestructorExpr(callee.getPseudoDestructorExpr());
   }
 
-  return EmitCall(E->getCallee()->getType(), callee, E, ReturnValue);
+  return EmitCall(E->getCallee()->getType(), callee, E, ReturnValue, ConstValue);
 }

Note that we are only handling regular function calls of the form f(x) and not calls to member functions (that might be constexpr as well) like a.b(x). I will discuss below why the latter case is more complex and does not fit really well here.

We simply use the constant when the call would be emitted instead.

diff --git a/lib/CodeGen/CGCall.cpp b/lib/CodeGen/CGCall.cpp
index e56fe2c1c1..8a794ff229 100644
--- a/lib/CodeGen/CGCall.cpp
+++ b/lib/CodeGen/CGCall.cpp
@@ -3713,11 +3713,16 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
                                  const CGCallee &Callee,
                                  ReturnValueSlot ReturnValue,
                                  const CallArgList &CallArgs,
+                                 llvm::Constant* ConstValue,
                                  llvm::Instruction **callOrInvoke) {
   // FIXME: We no longer need the types from CallArgs; lift up and simplify.
 
   assert(Callee.isOrdinary());
 
+  // If we already know the value of this function call, just use it.
+  if (ConstValue)
+      return RValue::get(ConstValue);
+

After these changes (and a few others to get everything compiling again) we get what we expected:

$ clang -std=c++14 -S -o- -emit-llvm -Xclang -disable-llvm-passes fib.cc -O1 -v
; Some limes omitted for brevity
; Function Attrs: nounwind uwtable
define i32 @_Z5test1v() #0 {
entry:
  ret i32 89
}
; Function Attrs: nounwind uwtable
define i32 @_Z5test2v() #0 {
entry:
  ret i32 89
}

Member function calls

Constructors and a special members like constructors can be constexpr. But these cases can't really be optimized as easily, the reason is that the implicit parameter (the object pointed by this) should be evaluatable at constant time. When this happens, though, clang already does a good job for them.

struct A
{
  int x;
  constexpr A(int x) : x(x + 1) { }
};
int foo1()
{
  return A(3).x;
}

int foo2()
{
  constexpr A a(3);
  return a.x;
}

; clang -std=c++14 -S -o- -emit-llvm -Xclang -disable-llvm-passes test.cc -O0
define i32 @_Z4foo1v() #0 {
entry:
  ; Not sure why this temporary is needed but
  ; likely LLVM will optimize it away.
  %ref.tmp = alloca %struct.A, align 4
  call void @_ZN1AC2Ei(%struct.A* %ref.tmp, i32 3)
  ret i32 4
}
; Function Attrs: noinline nounwind uwtable
define i32 @_Z4foo2v() #1 {
entry:
  ret i32 4
}

Does this make sense?

While this change looks like a low hanging fruit I'm not sure it makes much sense. clang was never designed to be a front end that optimizes its output. Doing that would have some value as it would free LLVM of some optimization work. But if the same optimization can be done in LLVM the benefit is much higher as all clients (including clang) will take advantage of it. Also, C++14 gives much more freedom when writing constexpr functions, without requiring recursion. This means that the LLVM IR generated by clang will be much simpler and then LLVM may optimize it much easier. All in all, this change might be a marginal improvement for users that cannot use C++14.