Modern C++ as a Better Compiler

Last week I made the case that Standard C++ offers both productivity and performance and showed how C++ can be just as concise and elegant as C# and indeed more so. But I didn’t really address performance so let’s do so now. It’s trendy today to refer to your platform or framework of choice as being native. Everyone’s native these days. Even managed code is native. It wasn’t long ago that such a statement would have been greeted with incredulity but today it seems as if the marketing folks have hijacked the word and we’re all a happy native family. But performance doesn’t lie and when it comes to native code there’s nothing quite like Standard C++.

Here’s a question for you. Can a library outperform a compiler? With all of the talk about native code generation it’s helpful to remember that modern C++ is in many ways a code generator. The C++ community is literally bursting with impressive examples where smart C++ developers are coming up with ways of programming C++ at compile time in what is often called metaprogramming or generic programming.

But how can a library outperform a compiler? Standard C++ attempts to be the best possible language in which to write very efficient libraries. While parts of the language may be complex, libraries written in C++ should be easy to use. Modern C++ for the Windows Runtime is just such a library.

But how can a library outperform a compiler?! Doesn’t a library still need to be compiled? Well let’s run a little experiment and the Windows Runtime provides a good environment in which to compare libraries and compilers. You see, the Windows Runtime defines a binary platform that’s intended to be projected into different programming languages. When you write a Windows app in C# you are using the language projection provided by the C# compiler. The same goes for JavaScript and other language projections that I’ve heard of. But when it comes to C++ it’s a different story. Standard C++ is a different kind of language. Although it offers multiple programming paradigms it has a certain bias toward systems programming. This is why it’s so popular among operating system developers. It’s also not the kind of language that can support a language projection directly via the compiler unless you go and change the C++ language itself. This is precisely what the Visual C++ compiler attempts to do with its C++/CX language extension.

But that’s not how C++ was meant to be used. If the language doesn’t provide what you need then you can write a library. There’s no need to invent a new language or change the fundamental structure of the C++ language itself. Still, because the Visual C++ compiler offers up a compiler-based implementation of a Windows Runtime language projection we can now go ahead and compare the compiler’s performance against that of a Windows Runtime language projection implemented as a library using only Standard C++.

I’ll begin with a simple Windows Runtime component that offers up a class called Sample with a single static property returning an IVectorView of strings. An IVectorView is just a read-only vector with a portable ABI that is understood by different language projections. Using Modern C++ for the Windows Runtime I’m left simply having to implement this Strings method, which represents that static Strings property within the component:

class SampleFactory : public SampleFactoryT<SampleFactory>
{
public:
    
    IVectorView<String> Strings()
    {
        // code goes here
    }
};

Since I’m implementing this component in modern C++, I can use whatever modern or standard libraries that I’m most familiar with as a C++ developer. Let’s use a few standard containers to build a really big vector of strings:

vector<String> values;
wstring const value = L"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

for (unsigned i = 0; i != 10000000; ++i)
{
    values.emplace_back(value.c_str(), i % value.size() + 1);
}

I’m using the standard string to act as a sort of template and then I’m filling a vector of Windows Runtime strings with a range of values that look something like this:

A
AB
ABC
ABCD
ABCDE
ABCDEF
ABCDEFG
...

And this will simply repeat until the vector contains ten million strings. So I have a really big collection of strings. I now need to pass it back to the caller and I can do that by simply wrapping it inside an implementation of IVectorView as follows:

return VectorView(move(values));

So my component’s static property boils down to this:

IVectorView<String> Strings()
{
    vector<String> values;
    // pump full of values ...
    return VectorView(move(values));
}

The VectorView function will create a COM object that implements the necessary interfaces such that this standard vector of strings may be transported across ABI boundaries very efficiently.

With the component implemented I can return to app development and see what this looks like from various language projections. First let’s look at C#:

var timer = new Stopwatch();
IReadOnlyList<string> strings = Sample.Strings;
long sum = 0;
timer.Start();

foreach (string s in strings)
{
    sum += s.Length;
}

timer.Stop();
m_sum = sum;
m_elapsed = (long)timer.Elapsed.TotalMilliseconds;

I call Sample.Strings to retrieve the collection of strings and then use a C# foreach statement to iterate over the collection. The sum is just a sanity check to confirm that each implementation walked the same collection of strings. In this case I’m using the .NET Framework’s Stopwatch class to measure how long it takes to iterate over the collection.

This is a very simple and unscientific test but it should give us a good idea of how efficiently each language projection can iterate over a collection expressed in terms of Windows Runtime collection interfaces. There’s a lot of memory involved and a lot of virtual function calls. A language projection is going to have to be very careful to manage this efficiently, but I’m sure any decent compiler can figure it out.

I ran a release build of this C# version a few times and it consistently calculated a sum of 264999712 characters in around 2619 milliseconds. Now let’s take a look at C++/CX:

IVectorView<String ^> ^ strings = Sample::Strings;
long long sum = 0;
auto start = Now();

for (String ^ s : strings)
{
    sum += s->Length();
}

m_elapsed = Elapsed(start);
m_sum = sum;

In this case I’m using a pair of functions that use the operating system’s high resolution performance counter to measure milliseconds. Other than that, the samples are equivalent, the sums match, but the elapsed time is around 628 milliseconds. And finally we come to the standard C++ approach:

IVectorView<String> strings = Sample::Strings();
long long sum = 0;
auto start = Now();

for (String const & s : strings)
{
    sum += s.Length();
}

m_elapsed = Elapsed(start);
m_sum = sum;

Here again you’ll notice that the Strings property is projected as a method and it returns a vector view of strings without any hats. From a performance perspective, the ‘const &’ in the range-based for statement is purely a matter of style and convention and the omission of which would make no difference at run time. Again the sums match, but the elapsed time is even faster at 447 milliseconds!

perf

Can a library outperform a compiler? It’s perhaps a bit of a philosophical question but it’s clear that the C++ compiler is insanely good at optimizing Standard C++. The library developer is also in the driver’s seat and is able to optimize everything from resource management, algorithms, iterators and adapters, and so much more. Clearly C# does not provide ‘native’ performance. Although C++/CX gets you a lot closer it does so by trading productivity and you lose the essence of the C++ language. I could go on to explain why C# is so much slower but the bottom line is that only Standard C++ allows you to do anything about it. And that’s the point. If you’re using C# or C++/CX you’re at the mercy of the compiler. Only Standard C++ lets you go beyond the compiler. Modern C++ for the Windows Runtime is for those of you who love C++ but also want to create Windows apps.

24 thoughts on “Modern C++ as a Better Compiler

  1. Devid

    This looks interesting!
    Now the only question is where is this Modern C++ ?
    How can I run this examples ?
    On http://moderncpp.com there is no Download and no Github or something like this ?
    It is not part of Visual Studio 2015. So how do I test this examples ?

    Reply
  2. Viacheslav Dronov

    Very good example about truly native language.
    Thank you!!!
    Can you add test for C# component?
    var values = new List();
    string value = “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz”;
    for (var i = 0; i != 10000000; ++i)
    {
    values.Add(value.Substring(0, i%value.Length + 1));
    }
    It’s very interesting to see result.

    Reply
      1. Kenny Kerr Post author

        A WinRT component written in C# and called from a C# app will not use the WinRT ABI but instead call the CLR types directly (this is a lot faster).

  3. Martin Brandstätter

    Excelent article .
    But c++ can do way better . This example is SINGLETHREAD . Modern programing is all about parallel programing . I am quite sure that you could get way better results spliting the problem to more threads using future (should be easy in this case no need to care about concurency ).

    Reply
    1. Kenny Kerr Post author

      Agreed, but the purpose of the article is to compare the efficiency of different language projections performing the same single-threaded operation.

      Reply
    1. Kenny Kerr Post author

      There are more efficient ways of solving this particular problem in both C# and C++, but the purpose of the article is to compare the efficiency of different Windows Runtime language projections performing the same single-threaded operation.

      Reply
      1. trophyninjashrub

        The markup removed my reference to the header name algorithm.

        I wasn’t arguing for performance, but just for more idiomatic code. In fact I bet you’d find there’s either zero performance impact or really close to zero. But we’re talking about using modern C++ where raw loops are often discouraged in non-generic code.

      2. Kenny Kerr Post author

        I don’t disagree at all, but the purpose of the experiment was to compare the language projections as directly as possible. Using code that is specific to C++ or C# would have obscured the central claims I was trying to illustrate.

    1. Kenny Kerr Post author

      I did that for this test, although I would argue that this is a liability that affects the performance of managed code and thus should be considered.

      Reply
  4. Joren

    I think you might be able to do even better. For one, you haven’t called ‘reserve’ on the vector even though you know exactly how many elements it will contain eventually. Also, there’s no need to move the result out of the function. You can just return by value, and let the compiler elide the copy. Depending on how smart your compiler is, this may already be happening. It may however also be harmful to performance to have the move there. (A quick look at the assembler output of GCC (4.9) showed that with no optimization, the move is actually harmful. At O2 the output was identical.)

    Reply
    1. Kenny Kerr Post author

      What you’re missing is that I’m not comparing C++ and C#. I am comparing the language projections. There are naturally more efficient ways to solve this problem in both languages, but here I’m comparing the performance of the language projection in performing the same operation that necessarily has to cross component boundaries in a portable manner and involve many virtual function calls behind the scenes.

      Reply
  5. Thiago Adams

    Hi Kenny,
    Congratulations for the Modern C++. I think it’s the best way to interface with the Windows RT.
    What I would like to understand better, is why C++ programmers should use Windows RT. Is it because sometimes it is the only lib available or because Windows RT is the new Windows OS API?
    I see Windows RT like a new Visual Basic 6 library. But at this time, MS is trying to make the library the only path to the OS. At the past, Win32 was the foundation API for everything. But what is the foundation now? Can we trust on the Windows RT? It seems like C++ WinRT programs now are made of lot “ActiveXs”.

    Reply
    1. Kenny Kerr Post author

      Windows RT != Windows Runtime. Thank the Microsoft marketing folks for that disaster.

      Anyway, the Windows Runtime is the technology (like COM) that practically all new Windows APIs are using to allow application developers to access operating system services.

      Reply
  6. husseindharsi

    Kenny I have always believed that nothing gets close to C++ in terms of performance – for BMA (business managed applications) C# is used because of GUI (WPF/WINFORMS) etc … It is a night mare to use MFC…

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s