Author Archives: Kenny Kerr

C++/WinRT: Coroutines and the Calling Context

Previous: Hosting the Windows Runtime

Let’s now return to the topic coroutines and the issue of execution or calling context. Since coroutines can be used to introduce concurrency or deal with latency in other APIs, some confusion may arise as to the execution context of a give coroutine at any given point in time. Let’s clear a few things up.

Here’s a simple function that will print out some basic information about the calling thread:

void print_context()
{
    printf("thread:%d apartment:", GetCurrentThreadId());

    APTTYPE type;
    APTTYPEQUALIFIER qualifier;
    HRESULT const result = CoGetApartmentType(&type, &qualifier);

    if (result == S_OK)
    {
        puts(type == APTTYPE_MTA ? "MTA" : "STA");
    }
    else
    {
        puts("N/A");
    }
}

That’s by no means an exhaustive or foolproof dump of apartment information, but it is good enough for our purposes today. For the COM nerds out there, N/A is “not applicable” and not the other NA that you’re thinking of. 😊 Recall that there are two primary apartment models. A process has at most one multi-threaded apartment (MTA) and may have any number of single-threaded apartments (STA). Apartments are an unfortunate reality designed to accommodate COM’s remoting architecture but traditionally used to support COM objects that were not thread safe.

The single-threaded apartment (STA) is used by COM to ensure that objects are only ever called from the thread on which they were created. Naturally, this implies some mechanism to marshal calls from other threads back on to the apartment thread. STAs typically use a message loop or dispatcher queue for this. The multi-threaded apartment (MTA) is used by COM to indicate that no thread affinity exists but that marshaling is required if a call originates in some other apartment. The relationship between objects and threads is complex, so I’ll save that for another day. The ideal scenario is when an object proclaims that it is agile and thus free from apartment affinity.

Let’s use the print_context function to write a few interesting programs. Here’s one that calls print_context before and after using resume_background, to move work on to a background (thread pool) thread.

IAsyncAction Async()
{
    print_context();
    co_await resume_background();
    print_context();
}

Consider also the following caller:

int main()
{
    init_apartment();
    Async().get();
}

The caller is important because it determines the original or calling context for the coroutine (or any function). In this case, init_apartment is called from main without any arguments. This means that the app’s primary thread will join the MTA. It thus does not require a message loop or dispatcher of any kind. It also means that the thread can happily block execution as I have done here by using the blocking get function to wait for the coroutine to complete. Since the calling thread is an MTA thread, the coroutine begins execution on the MTA. The resume_background co_await expression is used to suspend the coroutine momentarily so that the calling thread is released back to the caller and the coroutine itself is free to resume as soon as a thread is available from the thread pool so that the coroutine may continue to execute concurrently. Once on the thread pool, otherwise known as a background thread, print_context is once again called. Here’s what you might see if you were to run this program:

thread:18924 apartment:MTA
thread:9568 apartment:MTA

The thread identifiers don’t matter. What matters is that they are unique. Notice also that the thread temporarily provided by the thread pool was also an MTA thread. How can this be since we did not call init_apartment on that thread? If you step into the print_context function you will notice that the APTTYPEQUALIFIER distinguishes between these threads and identifies the thread association as being implicit. Again, I’ll leave a deeper discussion of apartments for another day. Suffice to say that you can safely assume that a thread pool thread is an MTA thread in practice, provided the process is keeping the MTA alive by some other means.

The key is that the resume_background co_await expression will effectively switch the execution context of a coroutine to the thread pool regardless of what thread or apartment it was originally executing on. A coroutine should presume that it must not block the caller and ensure that evaluation of a coroutine is suspended prior to some compute-bound operation potentially blocks the calling thread. That does not mean that resume_background should always be used. A coroutine might exist purely to aggregate some other set of coroutines. In that case, any co_await expression within the coroutine may provide the necessary suspension to ensure that the calling thread is not blocked.

Now consider what happens if we change the caller, the app’s main function as follows:

int main()
{
    init_apartment(apartment_type::single_threaded);

    Async();

    MSG message;

    while (GetMessage(&message, nullptr, 0, 0))
    {
        DispatchMessage(&message);
    }
}

This seems reasonable enough. The primary thread becomes an STA thread, calls the Async function (without blocking in any way), and enters a message loop to ensure that the STA can service cross-apartment calls. The problem is that the MTA has not been created, so the threads owned by the thread pool will not join the MTA implicitly and can thus not make use of COM services such as activation. Here’s what you might see if you were to run the program now:

thread:17552 apartment:STA
thread:19300 apartment:N/A

Notice that following the resume_background suspension, the coroutine no longer has an apartment context in which to execute code that relies on the COM runtime. If you really need an STA for your primary thread, this problem is easily solved by ensuring that the MTA is “always on” regardless.

int main()
{
    CO_MTA_USAGE_COOKIE mta{};
    check_hresult(CoIncrementMTAUsage(&mta));
    init_apartment(apartment_type::single_threaded);

    Async();

    MSG message;

    while (GetMessage(&message, nullptr, 0, 0))
    {
        DispatchMessage(&message);
    }
}

The CoIncrementMTAUsage function will ensure that the MTA is created. The calling thread becomes an implicit member of the MTA until or unless an explicit choice is made to join some other apartment, as is the case here. If I were to run the program again I get the desired result:

thread:11276 apartment:STA
thread:9412 apartment:MTA

This is essentially the environment in which most modern Windows apps find themselves, but now you know a bit more about how you can control or create that environment for yourself. Join me next time as we continue to explore coroutines.

C++/WinRT: Hosting the Windows Runtime

Previous: Understanding Weak References and the Dispose Pattern

Most developers will use Windows Runtime APIs from a hosting environment already set up to support those APIs. Whether you are authoring a Windows Runtime API or simply making use of various APIs from an app, chances are you can simply start writing code and not think too much about how it all comes to life. We are also working on making it even more seamless by essentially having an “always on” mode where the Windows Runtime is as ubiquitous as the CRT is for the C++ developer.

So, what do I mean by a hosting environment? It is not as mysterious as it sounds. WinRT is just an extension of COM and relies on COM, specifically the runtime services offered by combase.dll, to implement and support essential services like activation and marshaling. Ideally, most APIs will be agile and thus not affected by marshaling. On the other hand, activation is essential and must occur within the context of a COM apartment.

Traditionally, a developer would call CoInitializeEx to associate the calling thread with some apartment. While the apartment is important, calling CoInitialize also ensures that the COM runtime is active within the process, making it possible to retrieve a given WinRT activation factory. CoInitializeEx is by no means the only way to ensure that the COM runtime is loaded. You can for example call CoIncrementMTAUsage to achieve the same end. The resulting relationship between thread and apartment will be different, but you will still be able to access WinRT types. WinRT also introduced a variant of CoInitializeEx called RoInitialize. The differences between these three functions (and others) is not insignificant, but not important for today’s topic.

As I mentioned, most developers will not have to think too much about hosting and in the future, it should be largely ubiquitous. Until then, C++/WinRT provides a few helper functions that you may need to use to control your environment. The first is init_apartment and you have probably seen it in a variety of console and desktop samples:

int main()
{
    init_apartment();

    // Use C++/WinRT here!
}

The init_apartment function wraps RoInitialize and will, by default, cause the calling thread to join the multi-threaded apartment. A console apps will typically be satisfied with just this one call. A desktop app may need a single-threaded apartment:

int __stdcall wWinMain(HINSTANCE, HINSTANCE, LPWSTR, int)
{
    init_apartment(apartment_type::single_threaded);

    // Create window and use C++/WinRT here!

    MSG message;

    while (GetMessage(&message, nullptr, 0, 0))
    {
        DispatchMessage(&message);
    }
}

Keep in mind that you should only call init_apartment on a thread you own. Specifically, do not ever call it from a thread pool callback. If you are writing some helper function, you cannot reliably assume that you can simply call init_apartment. Beyond the two examples already shown, there is rarely a case where you need to call this function yourself. However, if you decide that you do need to call it, then you have the responsibility of taking part in the management of the hosting environment. If you can avoid it, please do so because it can get tricky. That is partly why we are working on ubiquitous runtime support. Until then, there are a few things you should know.

If you call init_apartment you should technically call uninit_apartment, but there is nothing quite like process termination to make sure that everything is cleaned up properly. Therefore, you can generally avoid calling uninit_apartment from an app’s primary thread just before returning from main/WinMain. After all, the CRT is required to terminate the process once main returns.

There are two wrinkles with this plan. The first is that the CRT will call the destructors of any statics before terminating the process. That means that whatever code those destructors execute must be reachable during this short window of time. If you were to call uninit_apartment prior to returning from main/WinMain, this would typically cause the COM runtime to shut down. This in turn causes any DLLs that COM loaded to be abruptly unloaded. However, if any of those statics have outstanding references to objects living inside those DLLs, their destructors will attempt to make virtual calls to Release functions in pages that have already been unloaded. This will cause an access violation and result in undesirable process shutdown crashes.

This of course is only a problem if you have COM statics and Windows developers have known for decades about the perils of COM statics. The other wrinkle is that C++/WinRT caches activation factories to improve performance. The performance gain is substantial, but the challenge is that statics are used for caching. I will say that this problem has already been fixed in RS5 builds of C++/WinRT but remains an issue if you are using the RS4 version of C++/WinRT. Fortunately, C++/WinRT provides another helper function for clearing the cache. So, if you find yourself in the position of having to call uninit_apartment prior to process shutdown then you can call clear_factory_cache first to ensure that those references are released before COM decides to start unloading DLLs.

int main()
{
    init_apartment();

    {
        // Use C++/WinRT here!
    }

    clear_factory_cache();
    uninit_apartment();
}

That is the reason why Raymond Chen calls clear_factory_cache in this example.

C++/WinRT: Understanding Weak References and the Dispose Pattern

Previous: Coroutines and the Thread Pool

I am going to interrupt the mini-series on coroutines to address a topic that came up on an internal discussion forum that may be helpful to a larger audience. Stay tuned for more on coroutines.

You might be surprised to learn that the Windows Runtime supports both weak references and the dispose pattern. At first glance, these do not seem like things that should belong in a platform built on COM and implemented in C++. This naturally leads to some confusion, so let me briefly explain how this came about.

The Component Object Model (COM) is based on a reference counting model for object lifetime and resource management in general. This is all rooted in the IUnknown interface and I describe all of this in detail in The Essentials of COM. Sometime later, Microsoft made the fateful decision (joking) to bet on garbage collection rather than reference counting for the Common Language Runtime, better known as the .NET Framework. As an aside, I must point out that the Windows Runtime is a far more effective “common language runtime” than the CLR ever was. 😊 Still, garbage collection has the nice property that it avoids reference cycles. There is no need for the notion of a weak reference in .NET because the garbage collector is quite capable of detecting cycles and handles them winsomely. Ironically, .NET ended up adding support for a form of weak references anyway. It is always amusing to see systems and languages inspired by C++ decide to design away some of its perceived problems only to realize that they are not so easily avoided. In practice reference cycles can usually be avoided quite easily with good API design. In other words, COM and WinRT can support well-designed APIs without introducing cyclic references or necessitating weak references.

Garbage collection and deterministic finalization are not at odds, but it is true that runtimes that rely on garbage collection tend to favor nondeterministic finalization. Consequently, developers often conflate the two. Brian Harry wrote a fantastic description of .NET’s use of nondeterministic finalization and I highly recommend you spend some time reading it. It is a very helpful history lesson, even though I might not agree with all the conclusions. The short of it is that .NET favors nondeterministic finalization but provides the explicit dispose pattern to support resource management. COM and thus WinRT relies on reference counting and thus inherently supports resource management without any need for something like the dispose pattern. Going further, C++/WinRT provides automatic reference counting, so you do not even have to think about the reference counting architecture and you end up with a very efficient deterministic programming model that is also very concise and less susceptible to resource management errors.

So why does the Windows Runtime offer both the IClosable and IWeakReferenceSource interfaces? Let me start by saying that while these two artifacts have a related etymology, they solve different problems and are essentially orthogonal.

IClosable exists to allow WinRT APIs to be implemented in C++ and consumed by languages that lack deterministic finalization. A C++ caller never needs to use IClosable, unless you are working around a bug, and C++/WinRT will never call it without you knowing it. The CLR will however present types that implement IClosable as if they implement the IDiposable interface so that C# callers can employ a using statement to gain deterministic resource management of WinRT types implemented in C++.

The notion of weak references, as implemented both by the C++ standard library and by the Windows Runtime, solves a different problem. Sometimes a developer might come to the realization that they simply cannot find a way to break a cyclic reference and said developer might reach for something like shared_ptr and its close friend the weak_ptr. As I said, cycles can often be avoided with good API design but sometimes you just don’t get a choice on that API design.

Of course, a good API design based on reference counting will likely look different than a good API design based on garbage collection. That’s what happened to the Windows Runtime. The first few versions of C++/WinRT lacked support for weak references. I didn’t seem to need an implementation, so I pressed on. When I started exploring XAML support I quickly realized that weak references are essential. XAML was not designed for COM and thus didn’t consider cycles an issue. When it came time to build a native implementation of XAML on the Windows Runtime, a mechanism was needed to handle cycles without fundamentally changing the way XAML works. That mechanism is the weak reference. Weak references are in fact insufficient to solve all of XAML’s reference tracking problems, but that’s a topic for another day.

Naturally, there’s nothing XAML-specific about weak references. A given design might necessitate weak references even if it has nothing to do with either XAML or .NET, hence the existence of shared_ptr and weak_ptr. I will however say that any C++ design that relies on shared_ptr or weak_ptr should think long and hard about whether what they have come up with is really the best design. Weak references and the dispose pattern are a reminder to me of our fallibility as human beings. They are concessions for an imperfect world.

That’s enough history and theory. How do they work? Let me start with IClosable. Consider a runtime class defined with modern IDL as follows:

namespace Component
{
    runtimeclass Class : Windows.Foundation.IClosable,
                         Windows.Foundation.IStringable
    {
        Class();
    }
}

I have thrown in IStringable just for illustration purposes. The cppwinrt -component option, and soon the new Visual Studio project templates, will get you started with the following C++ implementation class:

struct Class : ClassT<Class>
{
    Class() = default;

    void Close();
    hstring ToString();
};

I could have just crafted up an implementation directly, rather than relying on IDL or the cppwinrt -component option. Here’s essentially what this boils down to using only the C++/WinRT library:

struct Class : implements<Class, IClosable, IStringable>
{
    Class() = default;

    void Close();
    hstring ToString();
};

The first implements the entire WinRT class while the second merely implements the WinRT interfaces. Either way, the semantics and implementation of IClosable are the same. Now consider the following tracing implementation:

struct Class : ClassT<Class>
{
    Class()
    {
        puts("Class");
    }

    ~Class()
    {
        puts("~Class");
    }

    void Close()
    {
        puts("Close");
    }

    hstring ToString()
    {
        return L"Class";
    }
};

A C++ caller would look like this:

int main()
{
    Class c;                  // <-- prints "Class" here
    hstring s = c.ToString();
}                             // <-- prints "~Class" here

A C# caller would look like this:

void Main()
{
    using (var c = new Component.Class()) // <-- prints "Class" here
    {
        string s = c.ToString();
    }                                     // <-- prints "Close" here
}

As you can see, their behavior is entirely different. 😊 An implementation might call Close from its destructor, to provide uniform destruction, and ensure that Close is itself noexcept. You can read Richter’s dire warnings about implementing the dispose pattern correctly. So that’s the dispose pattern and as you can see, it has nothing to do with weak references. Speaking of which.

While the use of weak references should be rare, outside of XAML, it’s not immediately obvious if or when they will be needed on a given type. Certainly, it’s almost impossible for C++/WinRT to know without imposing some syntactic burden on the developer. Consequently, C++/WinRT provides weak reference support automatically unless you explicitly opt out. The implementation itself is pay-for-play, so this doesn’t cost you anything until someone queries for the IWeakReferenceSource interface. For example, here’s how I might get a weak reference to my class:

Class c;

weak_ref<Class> weak(c);

If you find typing the type name annoying, you can use the make_weak helper:

Class c;

auto weak = make_weak(c);

As you might expect, creating the weak reference does not affect the reference count on the object itself, but the act of requesting a weak reference will allocate a control block that takes care of implementing the weak reference semantics. Some time later, the caller may attempt to promote the weak reference to a strong reference:

auto weak = make_weak(c);

if (Class strong = weak.get())
{
    // success!
}

Provided some other strong reference still exists, the get call will increment the reference count and return it to the caller. While the implementation of IWeakReferenceSource is incomparably more complex that IClosable, a developer using C++/WinRT doesn’t have to think about that since C++/WinRT has implemented that once for all. So, avoid weak references if possible but if you really need them, by all means make use of this capability.

And that’s the not-so-short story of weak references and the dispose pattern in the Windows Runtime. Join me next time as I continue to explore C++/WinRT.

C++/WinRT: Coroutines and the Thread Pool

Previous: Producing Async Objects

As we saw in the previous installment, creating a basic coroutine is trivial. You can very easily co_await some other async action or operation, simply co_return a value, or craft some combination of the two. To recap, here is a coroutine that is not asynchronous at all:

IAsyncOperation<int> return_123()
{
    co_return 123;
}

Even though it executes synchronously, it still produces a completely valid implementation of the IAsyncOperation interface:

int main()
{
    int result = return_123().get();
    assert(result == 123);
}

Here is one that will wait for five seconds before returning the value:

using namespace std::chrono;

IAsyncOperation<int> return_123_after_5s()
{
    co_await 5s;
    co_return 123;
}

This is ostensibly going to execute asynchronously and yet the main function remains largely unchanged, thanks to the get function’s blocking behavior:

int main()
{
    int result = return_123_after_5s().get();
    assert(result == 123);
}

The co_return statement in the last coroutine will execute on the Windows thread pool, since the co_await expression is a chrono duration that uses a thread pool timer. The co_await statement represents a suspension point and it should be apparent that a coroutine may resume on a completely different thread following suspension. You can also make this explicit using resume_background:

IAsyncOperation<int> background_123()
{
    co_await resume_background();
    co_return 123;
}

There is no apparent delay this time, but the coroutine is guaranteed to resume on the thread pool. What if you are not sure? You might have a cached value and only want to introduce a context switch if the value must be retrieved from latent storage. This is where it is good to remember that a coroutine is also a function, so all the normal rules apply:

IAsyncOperation<int> background_123()
{
    static std::atomic<int> result{0};

    if (result == 0)
    {
        co_await resume_background();
        result = 123;
    }

    co_return result;
}

This is only conditionally going to introduce concurrency. Multiple threads could conceivably race in and call background_123, causing a few of them to resume on the thread pool, but eventually the atomic will be primed and the coroutine will begin to complete synchronously. That is of course the worst case.

Let us imagine the value may only be read from storage once a signal is raised, indicating that the value is ready. We can use two coroutines to pull this off:

handle m_signal{ CreateEvent(nullptr, true, false, nullptr) };
std::atomic<int> m_value{ 0 };

IAsyncAction prepare_result()
{
    co_await 5s;
    m_value = 123;
    SetEvent(m_signal.get());
}

IAsyncOperation<int> return_on_signal()
{
    co_await resume_on_signal(m_signal.get());
    co_return m_value;
}

The first coroutine artificially waits for five seconds, sets the value, and then signals the Win32 event. The second coroutine waits for the event to become signaled, and then simply returns the value. Once again, the thread pool is used to wait for the event, leading to an efficient and scalable implementation. Coordinating the two coroutines is straightforward:

int main()
{
    prepare_result();

    int result = return_on_signal().get();
    assert(result == 123);
}

The main function kicks off the first coroutine but does not block waiting for its completion. The second coroutine immediately begins waiting for the value, blocking as it does so.

Thus far, I’ve focused on the thread pool, or what might be called background threads. C++/WinRT loves the Windows thread pool, but invariably you need to get work back onto a foreground thread representing some user interaction. Join me next time as I explore ways to take precise control over the execution context.

C++/WinRT: Producing Async Objects

Previous: Handling Async Completion

Now that we’ve explored the async interfaces and some completion mechanics in general, let’s turn our attention to creating or producing implementations of those four async interfaces. As we’ve already learned, implementing WinRT interfaces with C++/WinRT is very simple. I might for example implement IAsyncAction as follows:

struct MyAsync : implements<MyAsync, IAsyncAction, IAsyncInfo>
{
    // IAsyncInfo members...
    uint32_t Id() const;
    AsyncStatus Status() const;
    HRESULT ErrorCode() const;
    void Cancel() const;
    void Close() const;

    // IAsyncAction members...
    void Completed(AsyncActionCompletedHandler const& handler) const;
    AsyncActionCompletedHandler Completed() const;
    void GetResults() const;
}; 

The difficulty comes when you consider how you might implement those methods. While it is not hard to imagine some implementation, it is almost impossible to do it correctly without first reverse engineering how the existing language projections actually implement them. You see, the WinRT async pattern only works if everyone implements these interfaces using a very specific state machine and in exactly the same way. Each language projection makes the same assumptions about how this state machine is implemented and if you happen to implement it in a slightly different way, then bad things will happen.

Thankfully, you don’t have to worry about this because each language projection, with the exception of C++/CX, already implements this correctly for you. Here’s a complete implementation of IAsyncAction thanks to C++/WinRT’s coroutine support:

IAsyncAction CopyAsync()
{
    co_return;
}

Now this isn’t a particularly interesting implementation, but it is very educational and a good example of just how much C++/WinRT is doing for you. Since this is a complete implementation, we can use it to exercise some of what we’ve learned thus far. The CopyAsync function above is a coroutine. The coroutine’s return type is used to stitch together an implementation of both IAsyncAction and IAsyncInfo and the C++ compiler brings it to life at just the right moment. We’ll explore some of those details later, but for now let’s observe how this coroutine works. Consider the following console app:

IAsyncAction CopyAsync()
{
    co_return;
}

int main()
{
    IAsyncAction async = CopyAsync();

    async.get();
}

The main function calls the CopyAsync function, which returns an IAsyncAction. If you forget for a moment what the CopyAsync function’s body or definition looks like, it should be evident that it is just a function that returns an IAsyncAction object. We can therefore use it in all the ways that we’ve already learned.

A coroutine (of this sort) must have a co_return statement or a co_await statement. It may of course have multiple, but it must have at least one of these in order to actually be a coroutine. As you might expect, a co_return statement does not introduce any kind of suspension or asynchrony. Therefore, this CopyAsync function produces an IAsyncAction that completes immediately or synchronously. I can illustrate this as follows:

IAsyncAction Async()
{
    co_return;
}

int main()
{
    IAsyncAction async = Async();
    assert(async.Status() == AsyncStatus::Completed);
}

The assertion is guaranteed to be true. There is no race here. Since CopyAsync is just a function, the caller is blocked until it returns and the first opportunity for it to return happens to be the co_return statement. What this means is that if you have some async contract that you need to implement, but the implementation does not actually need to introduce any asynchrony, then it can simply return the value directly and without blocking or introducing a context switch. Consider a function that downloads and then returns a cached value:

hstring m_cache;

IAsyncOperation<hstring> ReadAsync()
{
    if (m_cache.empty())
    {
        // Download and cache value...
    }

    co_return m_cache;
}

int main()
{
    hstring message = ReadAsync().get();
    printf("%ls\n", message.c_str());
}

The first time ReadAsync is called, the cache is presumably empty, and the result is downloaded. Presumably this will suspend the coroutine itself while this takes place. We’ll talk more about how exactly suspension works a little later. Suspension implies that execution returns to the caller. The caller is handed an async object that has not in fact completed, hence the need to somehow wait for completion.

The beauty of coroutines is that there’s a single abstraction both for producing async objects and for consuming those same async objects. An API or component author might implement an async method as described above, but an API consumer or app developer may also use coroutines to call and wait for their completion. Let’s now rewrite the main function above to use a coroutine to do the waiting:

IAsyncAction MainAsync()
{
    hstring result = co_await ReadAsync();
    printf("%ls\n", result.c_str());
}

int main()
{
    MainAsync().get();
}

I have essentially taken the body of the old main function and moved it into the MainAsync coroutine. The main function uses the get method to prevent the app from terminating while the app completes asynchronously. The MainAsync function has something new and that’s the co_await statement. Rather than using the get method to block the calling thread until ReadAsync completes, the co_await statement is used to wait for the ReadAsync function to complete in a cooperative or non-blocking manner. This is what I meant by a suspension point. The co_await statement represents a suspension point. This app only calls ReadAsync once, but you can imagine it being called multiple times in a more interesting app. The first time it gets called, the MainAsync coroutine will actually suspend and return control to its caller. The second time its called, it will not suspend at all but rather return the value directly.

Coroutines are very new to many C++ developers so don’t feel bad if this still seems rather magical. We’ll continue to explore coroutines over the next few installments and these concepts should become quite clear. The good news is that you already know enough to begin to make effective use of coroutines to consume async APIs provided by Windows. For example, you should be able to reason about how the following console app works:

#include "winrt/Windows.Web.Syndication.h"

using namespace winrt;
using namespace Windows::Foundation;
using namespace Windows::Web::Syndication;

IAsyncAction MainAsync()
{
    Uri uri(L"https://kennykerr.ca/feed");
    SyndicationClient client;
    SyndicationFeed feed = co_await client.RetrieveFeedAsync(uri);

    for (auto&& item : feed.Items())
    {
        hstring title = item.Title().Text();

        printf("%ls\n", title.c_str());
    }
}

int main()
{
    init_apartment();
    MainAsync().get();
}

Give it a try right now and see just how much fun it is to use modern C++ on Windows.

C++/WinRT: Handling Async Completion

Previous: Understanding Async

Now that you have a handle on async interfaces in general, let’s begin to drill down into how they work in a bit more detail. Assuming you’re not satisfied with the blocking wait provided by the get method, what other options are there? We’ll soon switch gears and focus entirely on coroutines, but for the moment let’s take a closer look at those async interfaces to see what they offer. Both the coroutine support, as well as the get method we looked at last time, rely on the contract and state machine implied by those interfaces. I won’t go into too much detail because you really don’t need to know all that much about it, but let’s explore the basics so that it will at least be familiar if you do ever have to dive in and use them directly for something more out of the ordinary.

All four of the async interfaces logically derive from the IAsyncInfo interface. There’s very little you can do with IAsyncInfo and it’s regrettable that it even exists since it adds a bit of overhead. The only IAsyncInfo members that you should really consider are Status, which can tell you whether the async method has completed, and Cancel, which may be used to request cancellation of a long-running operation whose result is no longer needed. I nitpick this design because I really like the async pattern in general and just wish it were perfect because it is so very close.

The Status member can be useful if you need to determine whether an async method has completed without actually waiting for it. Here’s an example:

auto async = ReadAsync();

if (async.Status() == AsyncStatus::Completed)
{
    auto result = async.GetResults();
    printf("%ls\n", result.c_str());
}

Each of the four async interfaces, not IAsyncInfo itself, provide individual versions of the GetResults method that should only be called once you’ve determined that the async method has completed. Don’t confuse this with the get method provided by C++/WinRT. While GetResults is implemented by the async method itself, get is implemented by C++/WinRT. GetResults will not block if the async method is still running and will likely throw an hresult_illegal_method_call exception if called prematurely. You can no doubt begin to imagine how the blocking get method is implemented. Conceptually, it looks something like this:

auto get() const
{
    if (Status() != AsyncStatus::Completed)
    {
        // wait for completion somehow...
    }

    return GetResults();
}

The actual implementation is a bit more complicated, but this captures the gist of it. The point here is that GetResults is called regardless of whether it’s an IAsyncOperation, which returns a value, or IAsyncAction, which does not. The reason for this is that GetResults is responsible for propagating any error that may have occurred within the implementation of the async method and will rethrow an exception as needed. This is why I could simply wrap the get call inside a try-block and catch exceptions quite simply in my previous installment.

The question that remains is how the caller can wait for completion. Let’s write a non-member get function to see what’s involved. I’ll start with this basic outline, inspired by the conceptual get method above:

template <typename T>
auto get(T const& async)
{
    if (async.Status() != AsyncStatus::Completed)
    {
        // wait for completion somehow...
    }

    return async.GetResults();
}

I want this function template to work with all four of the async interfaces, so I’ll use the return statement unilaterally. Special provision is made in the C++ language for genericity and we can be thankful for that.

Each of the four async interfaces provides a unique Completed member that may be used to register a callback – called a delegate – that will be called when the async method completes. In most cases, C++/WinRT will automatically create the delegate for you. All you must do is provide some function-like handler and a lambda is usually the simplest:

async.Completed([](auto&& async, AsyncStatus status)
{
    // It's done!
});

The type of the delegate’s first parameter will be that of the async interface that just completed, but keep in mind that completion should be regarded as a simple signal. In other words, don’t stuff a bunch of code inside the Completed handler. Essentially, you should regard it as a noexcept handler because the async method will not itself know what to do with any failure occurring inside this handler. So what can we do?

Well, I might simply notify a waiting thread using an event. Here’s what our get function might look like:

template <typename T>
auto get(T const& async)
{
    if (async.Status() != AsyncStatus::Completed)
    {
        handle signal = CreateEvent(nullptr, true, false, nullptr);

        async.Completed([&](auto&&, auto&&)
        {
            SetEvent(signal.get());
        });

        WaitForSingleObject(signal.get(), INFINITE);
    }

    return async.GetResults();
}

C++/WinRT’s get methods use a condition variable with a slim reader/writer lock because it’s slightly more efficient. Such a variant might look something like this:

template <typename T>
auto get(T const& async)
{
    if (async.Status() != AsyncStatus::Completed)
    {
        slim_mutex m;
        slim_condition_variable cv;
        bool completed = false;

        async.Completed([&](auto&&, auto&&)
        {
            {
                slim_lock_guard const guard(m);
                completed = true;
            }

            cv.notify_one();
        });

        slim_lock_guard guard(m);
        cv.wait(m, [&] { return completed; });
    }

    return async.GetResults();
}

You can of course use the C++ standard library’s mutex and condition variable if you prefer. The point here is simply that the Completed handler is your hook to wiring up async completion and it can be done quite generically.

Naturally, there’s no reason for you to write your own get function and more than likely coroutines will be much simpler and more versatile in general. Still, I hope this helps you to appreciate some of the power and flexibility in the Windows Runtime.

That’s all I have time for today. Join me next time as we explore more about async in the Windows Runtime.

C++/WinRT: Understanding Async

Previous: Working with Strings

The Windows Runtime has a relatively simple async model in the sense that, like everything else in the Windows Runtime, it is focused on allowing components to expose async methods and making it simple for apps to call those async methods. It does not in itself provide a concurrency runtime or even anything in the way of building blocks for producing or consuming async methods. Instead, all of that is left up to the individual language projections. This is as it should be and is not meant to trivialize the Windows Runtime’s async pattern. It is no small feat to implement this pattern correctly. Of course, it also means that a developer’s perception of async in the Windows Runtime is very heavily influenced by their language of choice. A developer that has only ever used C++/CX might for example wrongly, but understandably, assume that async is a hot mess.

The ideal concurrency framework for the C# developer will be different to the ideal concurrency library for the C++ developer. The role of the language projection then is to take care of the mechanics of the async pattern and provide a natural bridge to a language-specific implementation.

Coroutines are the preferred abstraction for both implementing and calling async methods in C++, but first let’s make sure we understand how the async model works. Consider a class with a single static method that looks something like this:

struct Sample
{
    Sample() = delete;

    static Windows::Foundation::IAsyncAction CopyAsync();
};

Async methods end with “Async” by convention, so you might think of this as the Copy async method. There might be a blocking or synchronous alternative that is simply called Copy. It is conceivable that a caller might want a blocking Copy method for use by a background thread and a non-blocking, or asynchronous, method for use by a UI thread that cannot afford to block for fear of appearing unresponsive.

At first, the CopyAsync method may seem quite simple to call. I might write the following C++ code:

IAsyncAction async = Sample::CopyAsync();

As you might imagine, the resulting IAsyncAction is not actually the ultimate result of the async method, even as it is the result of calling the CopyAsync method in a traditional procedural manner. The IAsyncAction is the object that a caller may use to wait upon the result synchronously or asynchronously, depending on the situation. Along with IAsyncAction, there are three other well-known interfaces that follow a similar pattern and offer different features for the callee to communicate information back to the caller. The following table provides a comparison of the four async interfaces.

In C++ terms, the interfaces can be expressed as follows:

namespace Windows::Foundation
{
    struct IAsyncAction;

    template <typename Progress>
    struct IAsyncActionWithProgress;

    template <typename Result>
    struct IAsyncOperation;

    template <typename Result, typename Progress>
    struct IAsyncOperationWithProgress;
}

IAsyncAction and IAsyncActionWithProgress can be waited upon to determine when the async method completes, but these interfaces do not offer any observable result or return value directly. IAsyncOperation and IAsyncOperationWithProgress, on the other hand, expect the Result type parameter to indicate the type of result that can be expected when the async method completes successfully. Finally, IAsyncActionWithProgress and IAsyncOperationWithProgress expect the Progress type parameter to indicate the type of progress information that can be expected periodically for long-running operations up until the async method completes.

There are a few ways to wait upon the result of an async method. I won’t describe them all here since that would turn this into a very long article. Instead, I’ll save those for next time so that I can give them each the attention they deserve. While there are a variety of ways to handle async completion, there are only two that I recommend. Those two are the async.get() method, which performs a blocking wait, and the co_await async expression, which performs a cooperative wait in the context of a coroutine. Neither is better than the other as they simply serve different purposes. Let’s look at blocking wait today.

As I mentioned, a blocking wait can be achieved using the get() method as follows:

IAsyncAction async = Sample::CopyAsync();

async.get();

There’s seldom any value in holding on to the async object and the following form is thus preferred:

Sample::CopyAsync().get();

It’s important to keep in mind that the get method will block the calling thread until the async method completes. As such, it is not appropriate to use the get method on a UI thread since it may cause the app to become unresponsive. An assertion will fire in unoptimized builds if you attempt to do so. The get method is ideal for console apps or background threads where you may not want to use a coroutine for whatever reason.

Once the async method completes, the get method will return any result directly to the caller. In the case of IAsyncAction and IAsyncActionWithProgress, the return type is void. That might be useful for an async method that initiates a file copy operation, but less so for something like an async method that reads the contents of a file. Let’s add another async method to our example:

struct Sample
{
    Sample() = delete;

    static Windows::Foundation::IAsyncAction CopyAsync();
    static Windows::Foundation::IAsyncOperation<hstring> ReadAsync();
};

In the case of ReadAsync, the get method will properly forward the hstring result to the caller once the operation completes:

Sample::CopyAsync().get();

hstring result = Sample::ReadAsync().get();

Assuming execution returns from the get method, the resulting string will hold whatever value was returned by the async method upon its successful completion. Execution may not return, for example, if an error occurred but we’ll talk more about error handling later.

The get method is limited in the sense that it cannot be used from a UI thread, nor does it exploit the full potential of the machine’s concurrency, since it holds the calling thread hostage until the async method completes. Using a coroutine allows the async method to complete without holding such a precious resource captive for some indeterminate amount of time.

Join me next time as we explore more about async in the Windows Runtime.