Think In Geek

In geek we trust

A simple plugin for GCC – Part 1

GCC’s C and C++ compilers provide several extensions to address several programming needs not covered in the standards. One of these is the warn_unused_result attribute. This attribute warns us that we are discarding the result of a function. Unfortunately, for C++ it does not always work as expected.

Function calls in C/C++

In contrast to many programming languages, C and C++ allow the programmer to ignore the result of a function. This is useful in those situations where the programmer is only interested in the side-effects of the function call (this includes returning extra information through variables passed by reference).

A typical case is the printf family of functions. These function returns the number of characters written. This number may be a negative value if some input/output error happens or a positive number but lower than the characters we expected to write if there is not enough space in the output (as it happens with snprintf/vsnprintf). Almost no programmer bothers to check the result of these functions because their failure is a sign of a much deeper problem.

This means that most calls (probably 99% of them) to printf are just

printf("hello world\n");

rather than

int k = printf("hello world\n");
if (k < 0) abort(); // or some more elaborate error handling...

That said, some functions may return values that is essential for the programmer not to discard them. A classical example is malloc.

malloc(sizeof(int)); // A memory leak happens right here

GCC has an attribute warn_unused_result that can be specified for functions whose value cannot be ignored. This is used for diagnostics.

1
2
3
4
5
6
7
__attribute__((warn_unused_result))
int* new_int(void);
 
void g()
{
    new_int();
}

The code above will cause gcc to emit a warning.

$ gcc -c test.c
test.c: In function ‘g’:
test.c:6:5: warning: ignoring return value of ‘new_int’, declared with attribute warn_unused_result [-Wunused-result]
     new_int();
     ^

For C, this attribute gives predictable results. For instance if we return a structure by value it also works.

1
2
3
4
5
6
7
8
9
10
11
12
typedef struct A
{
    int *addr;
} A;
 
__attribute__((warn_unused_result))
A build_A(void);
 
void g()
{
    build_A();
}
$ gcc -c test.c
test.c: In function ‘g’:
test.c:11:5: warning: ignoring return value of ‘build_A’, declared with attribute warn_unused_result [-Wunused-result]
     build_A();
     ^

But surprisingly it does not work for C++.

$ g++ -c test.c
# no warnings even with -Wall

What is going on?

C++ is much, much more complicated than C. And even apparently identical code carries much more semantic load in C++ than its equivalent C code. One of the things that C++ has that C does not are destructors. Consider the following code

1
2
3
4
5
6
7
8
9
10
11
12
struct B
{
   int x;
   ~B();
};
 
B f();
 
void g()
{
  f();
}

In line 11 of the code above, the function call creates a temporary value that looks like unused. But it is not, since this value is a class type and that class type has a non-trivial destructor (and a user-defined destructor is never trivial), the code must invoke the destructor to destroy this temporary value. So, what at first looks like a discarded value, it happens to be used. The following code tries to represent what actually happens: a temporary is created with the call to f and then immediately destroyed.

9
10
11
12
void g()
{
  B _temp( f() ); _temp.B::~B();
}

If a class does not have a user-defined destructor, the compiler will generate one for the user. Sometimes, that compiler-defined destructor does not have to do anything. It exists internally in the compiler but it will generate no tangible code in our program. These destructors are called trivial. Trivial destructors happen when all the fields (and base classes) of a class type are of basic type (integer, pointer, reference, array of basic types, etc.) or of class type (or an array of) with a trivial destructor as well. Our class A shown above has a trivial destructor because its unique field is of type pointer to int, so a basic type.

For the cases where the destructor is trivial, the compiler will not emit any other call to the destructor, so the temporary object goes effectively unused. We would want these cases to be diagnosed.

With a motivation already set, we can now move on to implement this in GCC as a plugin.

GCC plugins

Ok, GCC is a rather old compiler (according to Wikipedia, its first release was in 1987) but it has evolved these years to gain new functionalities. One of these functionalities is being extensible via plugins. Plugins let us to extend the compiler without getting ourselves too dirty. GCC codebase, after 28 years, is huge and comes with its own quirks, so writing a plugin is not trivial but, in my opinion, can be very rewarding and revealing on how a real compiler works (for good and bad, of course).

Quick installation

At the moment the plugin interface of GCC follows a model similar to that of Linux modules: API stability is not guaranteed between versions. This means that, more or less our plugins will be tied to specific versions of GCC. This may not be ideal but this is how things are in GCC. For this post we will be using GCC 5.2 (released in June 16th 2015). At the time of writing this post, it is highly unlikely that your distribution provides that compiler version as the system compiler, so we will install it on a directory of our choice. This way we will avoid interferring with the system compiler, a sensitive piece of software that we do not want to break!

First, create a directory where we will put everything, enter it and download GCC 5.2

# define a variable BASEDIR that we will use all the time
$ export BASEDIR=$HOME/gcc-plugins
# Create the directory, if it does not exist
$ mkdir -p $BASEDIR
# Enter the new directory
$ cd $BASEDIR
# Download gcc using 'wget' ('curl' can be used too)
$ wget http://ftp.gnu.org/gnu/gcc/gcc-5.2.0/gcc-5.2.0.tar.bz2
# Unpack the file
$ tar xfj gcc-5.2.0.tar.bz2

The next step involves building GCC. First we need to get some software required by GCC itself.

# Enter in the source code directory of GCC
$ cd gcc-5.2.0
# And now download the prerequisites
$ ./contrib/download_prerequisites

Now, create a build directory sibling to gcc-5.2.0 and make sure you enter it.

# We are in gcc-5.2.0, go up one level
$ cd ..
# Now create the build directory, gcc-build is a sensible name
$ mkdir gcc-build
# Enter the build directory
$ cd gcc-build

Now configure the compiler and build it. In this step we will specify where the compiler will be installed. Make sure that you are in gcc-build! This step takes several minutes (about 15 minutes or so, depending on your machine) but you only have to do it once.

# Define an installation path, it must be an absolute path!
# '$BASEDIR/gcc-install' seems an appropiate place
$ export INSTALLDIR=$BASEDIR/gcc-install
# Configure GCC
$ ../gcc-5.2.0/configure --prefix=$INSTALLDIR --enable-languages=c,c++
# Build 'getconf _NPROCESSORS_ONLN' will return the number of threads
# we can use, in order to build GCC in parallel
$ make -j$(getconf _NPROCESSORS_ONLN)

Now install it

$ make install

Now check the installation

# Create a convenience variable for the path of GCC
$ export GCCDIR=$INSTALLDIR/bin
$ $GCCDIR/g++ --version
g++ (GCC) 5.2.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

That’s it. You can find much more information about installing here.

Skeleton of our plugin

In order to build a plugin for GCC, GCC installation installs a directory full of C++ headers representing its internal structures. We can now where these headers are installed using GCC itself.

$ $GCCDIR/g++ -print-file-name=plugin
/some/path/gcc-install/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/plugin

Let’s first create a directory where we will put our plugin.

$ mkdir $BASEDIR/gcc-plugins
$ cd gcc-plugins

While we can compile writing the commands, it soon becomes tedious, so let’s write a Makefile. We will use our just installed GCC 5.2, so make sure to fill the Makefile varialbe GCCDIR below with the value of your $GCCDIR.

GCCDIR = { put here the value of your ${GCCDIR} }
 
CXX = $(GCCDIR)/g++
# Flags for the C++ compiler: enable C++11 and all the warnings, -fno-rtti is required for GCC plugins
CXXFLAGS = -std=c++11 -Wall -fno-rtti 
# Workaround for an issue of -std=c++11 and the current GCC headers
CXXFLAGS += -Wno-literal-suffix
 
# Determine the plugin-dir and add it to the flags
PLUGINDIR=$(shell $(CXX) -print-file-name=plugin)
CXXFLAGS += -I$(PLUGINDIR)/include
 
# top level goal: build our plugin as a shared library
all: warn_unused.so
 
warn_unused.so: warn_unused.o
	$(CXX) $(LDFLAGS) -shared -o $@ $<
 
warn_unused.o : warn_unused.cc
	$(CXX) $(CXXFLAGS) -fPIC -c -o $@ $<
 
clean:
	rm -f warn_unused.o warn_unused.so
 
check: warn_unused.so
	$(CXX) -fplugin=./warn_unused.so -c -x c++ /dev/null -o /dev/null
 
.PHONY: all clean check

This makefile by default will only build the plugin. If we want to test it, we can use make check. Currently the check rule compiles an empty file (actually /dev/null) but this is enough to test it for now.

Now we need to write some code for the file warn_unused.cc. As a starter, let’s make a plugin that does nothing but prove it works.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <iostream>
 
// This is the first gcc header to be included
#include "gcc-plugin.h"
#include "plugin-version.h"
 
// We must assert that this plugin is GPL compatible
int plugin_is_GPL_compatible;
 
int plugin_init (struct plugin_name_args *plugin_info,
	     struct plugin_gcc_version *version)
{
  // We check the current gcc loading this plugin against the gcc we used to
  // created this plugin
  if (!plugin_default_version_check (version, &gcc_version))
    {
      std::cerr << "This GCC plugin is for version " << GCCPLUGIN_VERSION_MAJOR
	<< "." << GCCPLUGIN_VERSION_MINOR << "\n";
      return 1;
    }
 
  // Let's print all the information given to this plugin!
 
  std::cerr << "Plugin info\n";
  std::cerr << "===========\n\n";
  std::cerr << "Base name: " << plugin_info->base_name << "\n";
  std::cerr << "Full name: " << plugin_info->full_name << "\n";
  std::cerr << "Number of arguments of this plugin:" << plugin_info->
    argc << "\n";
 
  for (int i = 0; i < plugin_info->argc; i++)
    {
      std::cerr << "Argument " << i << ": Key: " << plugin_info->argv[i].
	key << ". Value: " << plugin_info->argv[i].value << "\n";
 
    }
  if (plugin_info->version != NULL)
    std::cerr << "Version string of the plugin: " << plugin_info->
      version << "\n";
  if (plugin_info->help != NULL)
    std::cerr << "Help string of the plugin: " << plugin_info->help << "\n";
 
  std::cerr << "\n";
  std::cerr << "Version info\n";
  std::cerr << "============\n\n";
  std::cerr << "Base version: " << version->basever << "\n";
  std::cerr << "Date stamp: " << version->datestamp << "\n";
  std::cerr << "Dev phase: " << version->devphase << "\n";
  std::cerr << "Revision: " << version->devphase << "\n";
  std::cerr << "Configuration arguments: " << version->
    configuration_arguments << "\n";
  std::cerr << "\n";
 
  std::cerr << "Plugin successfully initialized\n";
 
  return 0;
}

Now we can build the plugin.

$ make
/some/path/gcc-install/bin/g++ -std=c++11 -Wall -Wno-literal-suffix -I/home/roger/soft/gcc/gcc-5.2/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/plugin/include -fPIC -c -o warn_unused.o warn_unused.cc
/some/path/gcc-install/bin/g++ -shared -o warn_unused.so warn_unused.o

And test if it works

$ make check
/some/path/gcc-install/bin/g++ -fplugin=./warn_unused.so -c -x c++ /dev/null -o /dev/null
Plugin info
===========
 
Base name: warn_unused
Full name: ./warn_unused.so
Number of arguments of this plugin:0
 
Version info
============
 
Base version: 5.2.0
Date stamp: 20150716
Dev phase: 
Revision: 
Configuration arguments: ../gcc-5.2.0/configure --enable-languages=c,c++ --prefix=/some/path/gcc-install
 
Plugin successfully initialized

It works. Great!

How it works

Plugins are implemented using dynamic libraries. We use the flag -fplugin=file.so, where file.so is the dynamic library that implements our plugin. GCC loads the dynamic library and invokes the function plugin_init. This function is used to initialize the plugin and to register itself inside the compiler. In its current state, our plugin does nothing but verify the version of GCC compatible with this plugin and show some information passed by GCC during loading.

Our plugin license must be compatible with GCC

Since plugins are somehow integrated with the code of GCC we have to assert in our code that the license of the plugin is GPL compatible by declaring a global variable plugin_is_GPL_compatible.

Next steps

Now that we have the basic infrastructure to build plugins we can continue developing our plugin. But this post is long enough so let’s postpone this until the next time.

You can find the supporting code for this post here. Make sure you fix the GCCDIR variable in the Makefile.

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

16 thoughts on “A simple plugin for GCC – Part 1

  • down says:

    Thank you a lot for this very interesting post. Unfortunately I got an error during build. Did I miss a prerequisite?

    ….gcc-install/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/plugin/include/system.h:662:17: fatal error: gmp.h: No such file or directory

    I has to add –disable-multilib during configuring the gcc.

    Thank you a lot.

    • Roger Ferrer Ibáñez says:

      Hi,

      maybe you missed the step where you have to invoke ./contrib/download_prerequisites?

      The multilib part it might be because your system lacks the 32-bit development files, I think that --disable-multilib can be used as a workaround.

      Kind regards,

      • down says:

        Thank you very much for your help. Actually, I found the solution to my problem at: https://gcc.gnu.org/ml/gcc/2015-02/msg00254.html

        Creating a gmp.h with content:
        #ifndef FAKE_GMP_H
        #define FAKE_GMP_H
        typedef void* mpz_t;
        #endif

        and copy it to: …./gcc-install/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/plugin/include/gmp.h

        works on my machine. I’m not sure why but anyway thank you for your great work.

        • Roger Ferrer Ibáñez says:

          Ah! Makes sense, because I have development files of GMP already installed in my system.

          Thanks for the tip!

  • milly says:

    Hi,I am a beginner in gcc plugin and this post helped me a lot.
    Now I am working on a plugin which needs to get some information about variables and I wonder how can I affirm there is a loading or storing operation somewhere.
    Maybe like LLVM:
    unsigned opCode = ins.getOpcode();
    opCode == Instruction::Load || opCode == Instruction::Store

    Thank you very much.

  • Jun says:

    Roger, this is definitely the best articles on GCC plugin! I am very excited to find it, thanks a lot for sharing your knowledge!!!

    A quick question: I want to see how original C code is translated to RTL code, then translated to assembly code. To do this, I want to write a plugin that when compiling C code, it gives me the RTL code, then maps RTL code to assembly code (generated by backend). Do you have any hints on where to look at? Thanks a lot!

    • Roger Ferrer Ibáñez says:

      Hi Jun,

      I’m afraid I cannot fully answer your question. My feeling is that may be easier for you to start reading a simple backend like that of the moxie processor (check also the description of the architecture) along with the GCC internals to understand how the whole process works. The general idea is that once you have RTL, it is pattern matched repeatedly until no RTL expressions remain and you’re left only with instructions.

      Kind regards

  • Tiago Silva says:

    Hi,

    Alongside other comments I need to identify memory accesses using a gcc plugin.

    My idea is to call a plugin right before code generation (and possibly also change a little bit of the code generation) in order to analyze where store operations occur so I can substitute them with a virtual instruction.

    By doing this, I want to identify the memory addresses of write instructions for specific variables – identified by their name in the source code -, by analyzing the binary after disassembly and then put the real instructions in the binary. The ideia is to use fake instructions to tell me where store operations are beeing performed after linking.

    I don know if my approach is the best one, but if it is, do you have any ideia how I can map from RTL representation to the GENERIC tree, in order to know the name of the variables? Or how I can generate this fake instructions with the plugin?

    I do want to work very close to code generation to avoid any optimization which could remove memory accesses.

    Thanks!

    • Roger Ferrer Ibáñez says:

      Hi Tiago,

      my knowledge in GCC internal falls short here so I’m afraid that I can’t be of much help here.

      That said, given that you want to work on the final binary maybe a tool like Pin or DynamoRIO may be better suited, rather than attempting to do that in the compiler. That said, it is likely I lack the context to know whether this is a feasible approach.

      Kind regards,
      Roger

      • Tiago Silva says:

        Hello Roger,

        Imagine I know where store operations occur by analyzing GIMPLE and GENERIC representations. Do you know of any way to create a comment in the final assembly file to put some metadata about the memory operation?

        I cannot find any comment related node anywhere…

        Thanks,
        Best Regards,
        Tiago

      • Tiago Silva says:

        Hi Roger,

        Can you take a look at this please: http://stackoverflow.com/questions/42890428/gcc-plugin-adding-a-label-to-the-generated-assembly

        I could really use some input.

        I guess my problem is where I execute the pass but I’m not sure. Do you have any ideias how I can solve this problem?

        Best Regards,
        Tiago

        • Roger Ferrer Ibáñez says:

          Hi Tiago,

          I’m afraid I cannot help without a way to reproduce your problem. Can you set up some repository somewhere with a minimal plugin as a testcase where you observe this behaviour?

          Kind regards,
          Roger

          • Tiago says:

            Hi Roger,

            I found out the problem and I should probably close that stackoverflow post.

            The problem was that when gimple was translated to RTL the label was seen as unused. Even if I defined a label from the source code itself and I did not used it anywhere (for a go to for example) she would output as a .LXX.

            I solved the problem, by creating an RTL pass which searches for those unused labels, removes them and adds new RTL labels with the name I want since the information about the label is always present.

            Cheers.

          • Roger Ferrer Ibáñez says:

            Good to know!

Leave a Reply

Your email address will not be published. Required fields are marked *