In the last two parts of this series we've seen how to use GCC JIT and how to apply it to a simple regular expression matcher. But an open question remains, has it been worth the effort? This part will try to figure it out.
If you want to see the generated code (out of curiosity or for debugging), you can dump it into a Graphviz file. For instance, for the regular expression
ab*c we generate a
match function that looks like this. Note: to understand these graphs just recall that
block-10. Its code looks like this.
block-08. Its code looks like this.
Compilation is a hard process so the gains in speed when using a compiled regular expression matcher have to compensate the extra effort put in JIT compilation.
Let's first run the interpreted matcher to have a baseline. We will use the
As you can see, the original matcher is really fast and takes almost no time to run. Let's see now the JIT matcher.
Hey! It's slower! More than 15 times slower! Well, the truth is that Alice in Wonderland is a small text file. If we repeat the test on a big text file of 38 MiB (obtained from this benchmarking regex libraries page) then the results are slightly different.
Now the results show that our JIT is about 1.55 times faster. Not really brilliant but on big files that extra speed is always very well welcome.
Out of the critical path
The problem with the existing approach is that the JIT compilation may or may not bring enough benefits to be worth. What if we could have the cake and eat it? The idea is to JIT compile the matcher and if we have not finished yet matching lines, switch from the interpreted matcher to the compiled one. If there was not enough time, well, the compilation is simply discarded.
For the sake of simplicity, we will pass the same arguments as the interpreted code (
text) to the JIT code although, as we know, the
regexp parameter will not be used. Also, to avoid complicating the code too much, there will be two global variables
regexp that will be initialized at the beginning of the
main function. These two variables will keep the current matcher function being used and the regular expression. For the matcher function variable, we will initialize it with the interpreted matcher.
Then we will proceed to match lines. Since the matcher function may eventually change to the compiled version, we will load it at each iteration, before calling it.
We need to make an
atomic_load to avoid data races because this variable will be written, concurrently, when the JIT compilation finishes.
To concurrently compile the code, well, we will just spawn a detached
pthread before starting matching lines.
A detached pthread will allow us to end the program even if the JIT compilation has not finished yet. This way we will be able to benefit from the fastest execution technique assuming that we can run two threads in parallel. Almost all processors in current computers can.
concurrent_jit, simply compiles the code and updates the
In this case we have to use an
atomic_store to correctly update the value of
match_fun without incurring in data-races.
Let's see if compiling concurrently improves for the short case (where compiling is not worth).
Now the timings are much better. The performance is just about 1.15 times worse than the original algorithm. Compared to the 15 times slowdown of the first approach this is quite an improvement. Let's see what happens with the big file.
Great. We do not lose performance, we even improve a bit, about 1.15 times better. Not much improvement but shows that we actually want to take the compilation out of the critical path.
Viewing the concurrent compilation
Thread 1.1.1 is the main thread (the one running the
main function) and thread 1.1.2 is the one that compiles our code. As you can see, JIT compilation takes much longer than actually matching the lines. The blue part in the main thread is where all the calls to
match happen. Every call to match is rendered as a green flag but they are drawn overlapped because they last so short.
For the long file doing
Linus.*Linux, the result is quite the opposite. The overall matching process lasts much more than JIT compilation.
When the code has just been compiled, we see that the a couple of matching calls becomes much slower. The reason is probably caused by the communication between the two threads using atomic instructions.
Within Paraver we can also measure the average duration of the calls to match.
|Kind of code||Average
This is consistent with the speed gain of 1.55 times we observed with
jgrep-jit on the big file.
In these three-part short series about GCC JIT we have seen how to apply it to a regular expression matcher to get a bit more of speedup. The obtained speed gains are moderate (1.55 times). Since compilation may be a slow process, taking the compilation out of the critical path is beneficial for the cases where it may undermine the overall execution time of the program.
I have uploaded the code used in this series to github.
That's all for today.