New in C++/WinRT: Async Cancellation Callback for Coroutines

Today I’d like to share another new feature in build 17709 of the Windows SDK that happens to be the most frequently requested feature for C++/WinRT’s coroutine support and that is the addition of a cancellation callback. WinRT has a very specific async pattern that provides cancellation support, but that support does not automatically flow to other async objects, as it does with some other async frameworks and libraries. The cancellation callback gives you an efficient mechanism to make this work and integrate or emulate these other async models.

Previously, cancellation within a coroutine was supported in two ways. The first is explicit:

IAsyncAction OneAsync()
{
    auto token = co_await get_cancellation_token();

    while (!token())
    {
        printf("Do some work for 1 second\n");
        co_await 1s;
    }
}

Waiting on the get_cancellation_token function returns a cancellation token with knowledge of the IAsyncAction that the coroutine is producing on your behalf. You can use a function call to query the cancellation state, essentially polling for cancellation. This makes sense if you are performing some compute-bound operation.

The second option is implicit:

IAsyncAction TwoAsync()
{
    while (true)
    {
        printf("Do some work for 1 second\n");
        co_await 1s;
    }
}

Given a co_await expression, the coroutine checks whether it has been cancelled prior to suspension and will short-circuit out of there rather than suspending. This works great, but if suspension occurs prior to cancellation then the coroutine will not actually come to an end until the nested co_await expression completes and the outer coroutine subsequently returns or hits a subsequent co_await expression.

These options also made it rather difficult to integrate with existing concurrency libraries as there was no preemptive hook by which cancellation might be propagated. No longer! There is now a third option that allows you to register a cancellation callback. Imagine you have a nested coroutine that does the actual work. Let’s use TwoAsync above for that. You can now write another coroutine that wraps it up and forwards cancellation preemptively as follows:

IAsyncAction ThirdAsync()
{
    auto token = co_await get_cancellation_token();
    auto nested = TwoAsync();

    token.callback([=]
    {
        nested.Cancel();
    });

    co_await nested;
}

Notice that the coroutine registers a lambda as the callback and then simply suspends and waits for the nested action to complete. There’s no need to poll for cancellation and the cancellation isn’t blocked indefinitely. Yay! Naturally, you can use this to interop with other coroutine or concurrency libraries that know nothing of C++/WinRT.

And that’s all for today. Stay tuned for more about the next major update to C++/WinRT and do give it a try today.

New Features and Changes Coming to C++/WinRT: Header Isolation

Build 17709 of the Windows SDK (targeting RS5) includes many new features and improvements as well as a few breaking changes that I’d like to share with you today. This is the first major update to C++/WinRT since it officially launched in build 17134 of Windows (aka RS4).

Over the next few days I’m going to share more details about these improvements. Today I’d like to introduce header isolation.

C++/WinRT no longer depends on headers from the Windows SDK to compile. This is in line with CRT and STL headers that do not include any Windows headers to improve standards compliance and avoid inadvertent dependencies. It also dramatically reduces the number of macros that a C++ developer must guard against. Removing the dependency on the Windows headers means that C++/WinRT is more portable and standards compliant and furthers our efforts to make it a cross-compiler and cross-platform library. It also means that the C++/WinRT headers will never be mangled by macros. If you previously relied on C++/WinRT to include various Windows headers, you will now have to include them yourself. It has always been good practice to include any headers you depend on explicitly and not rely on another library to include them for you.

Consider the following example using build 17134 of the Windows SDK:

#include <winrt/Windows.Data.Json.h>
using namespace winrt::Windows::Data::Json;

void sample()
{
    auto json = JsonValue::Parse(LR"({ "Key": "Value" })");
    auto object = json.GetObjectW();
    printf("%ls\n", object.GetNamedString(L"Key").c_str());
}

Notice that the JsonValue’s GetObject method has a strange trailing W character in its name. The method really is named GetObject in the C++/WinRT headers. Unfortunately, those headers included restrictederrorinfo.h from the Windows SDK, which includes windows.h, which includes wingdi.h and that header defines a macro named GetObject. That’s just one example. There are just too many such examples and some are far more problematic.

As of build 17709, this is no longer a problem because the C++/WinRT headers no longer include those Windows headers. You can of course run into the same issue if you manually include windows.h but then at least it’s under your control and you can manage that yourself.

There are still a few headers from the Windows SDK that are included for intrinsics and numerics, but they should not introduce any problems. These dependencies will also likely disappear in the long run as we continue to push portability further.

On the other hand, you might rely on interop with the Windows headers for your particular project. Since the C++/WinRT headers no longer include Windows headers, this interop is disabled by default. If for example you need to implement a COM interface, rooted in ::IUnknown then you can easily re-enable this interop by first including unknwn.h before any C++/WinRT headers. In that way, the C++/WinRT base library will enable various hooks to ensure that this will continue to work. Here’s what that might look like:

#include <unknwn.h>
#include <winrt/Windows.Foundation.h>

struct __declspec(uuid("22155cae-e0e2-49eb-8d20-daac1abda34d")) IClassic : ::IUnknown
{
    virtual HRESULT __stdcall Call() = 0;
};

using namespace winrt;
using namespace Windows::Foundation;

struct Sample : implements<Sample, IStringable, IClassic>
{
    hstring ToString()
    {
        return L"Sample";
    }

    HRESULT __stdcall Call() override
    {
        return S_OK;
    }
};

And that’s all for today. Stay tuned for more about the next major update to C++/WinRT!

C++/WinRT and Beyond

I have been hard at work on C++/WinRT for the RS5 release. There should be an update available with the upcoming Windows SDK for Windows Insider builds very soon. This will include most of the changes coming in RS5 and I will begin to share more about that as soon as it is available.

Beyond RS5, I am starting a new project that builds on C++/WinRT and I cannot wait to share more about that.

Please check back again soon as I will be sharing a lot more information here on kennykerr.ca regularly. I have also rather unexpectedly left Twitter. I may share that tale of woe someday, but feel free to reach out via kenny@kennykerr.ca or via the comments.

If you have specific questions about C++/WinRT, please use the Developer Community.

C++/WinRT: Creating Collections Simply and Efficiently

Previous: Coroutines and the Calling Context

A question recently came up on an internal alias about how to create a WinRT collection, specifically an IVectorView, to represent some allocation of floating point values. Here’s my response. It illustrates how C++/WinRT helps you to create collections very efficiently and with very little effort. WinRT collections are rather complicated internally. C++/WinRT takes all of that complexity out of your hands, saving you a lot of time and effort.

I highly recommend using a std::vector for your data storage. You can then create an IVectorView quite simply:

std::vector<float> values{ 0.1f, 0.2f, 0.3f };
IVectorView<float> view = single_threaded_vector(std::move(values)).GetView();

for (auto&& value : view)
{
    printf("%.2f\n", value);
}

This is very efficient and avoids copies. If you need complete flexibility, you can implement IVectorView and IIterable yourself:

struct Sample : 
    implements<Sample, IVectorView<float>, IIterable<float>>
{
    float GetAt(uint32_t const index);
    uint32_t Size();
    bool IndexOf(float value, uint32_t& index);
    uint32_t GetMany(uint32_t startIndex, array_view<float> values) const;

    IIterator<float> First() const;
};

IVectorView<float> view = make<Sample>();

If you need a bit of help you can use the collection base classes (in RS5) to implement those interfaces:

struct Sample : 
    implements<Sample, IVectorView<float>, IIterable<float>>,
    vector_view_base<Sample, float>
{
    auto& get_container() const noexcept
    {
        return m_values;
    }

    std::vector<float> m_values{ 0.1f, 0.2f, 0.3f };
};

You can also return a custom container if you really don’t want to use std::vector:

struct Sample : 
    implements<Sample, IVectorView<float>, IIterable<float>>,
    vector_view_base<Sample, float>
{
    auto get_container() const noexcept
    {
        struct container
        {
            float const* const first;
            float const* const last;

            auto begin() const noexcept
            {
                return first;
            }

            auto end() const noexcept
            {
                return last;
            }
        };

        return container{ m_values.data(), m_values.data() + m_values.size() };
    }

    std::array<float, 3> m_values{ 0.2f, 0.3f, 0.4f };
};

Additional collection base classes exist for all of the generic collections in the Windows Runtime. There are also a number of cool features that the collection base classes offer for customizing the implementation and I’ll explore those in a future article. You can also learn more about this in our recent talk at Build 2018 where Brent and I introduced many of the productivity improvements in the version of C++/WinRT that ships with the Windows SDK as well as the Visual Studio extension for C++/WinRT that makes it a lot easier to get started with a new project.

C++/WinRT: Coroutines and the Calling Context

Previous: Hosting the Windows Runtime

Let’s now return to the topic coroutines and the issue of execution or calling context. Since coroutines can be used to introduce concurrency or deal with latency in other APIs, some confusion may arise as to the execution context of a give coroutine at any given point in time. Let’s clear a few things up.

Here’s a simple function that will print out some basic information about the calling thread:

void print_context()
{
    printf("thread:%d apartment:", GetCurrentThreadId());

    APTTYPE type;
    APTTYPEQUALIFIER qualifier;
    HRESULT const result = CoGetApartmentType(&type, &qualifier);

    if (result == S_OK)
    {
        puts(type == APTTYPE_MTA ? "MTA" : "STA");
    }
    else
    {
        puts("N/A");
    }
}

That’s by no means an exhaustive or foolproof dump of apartment information, but it is good enough for our purposes today. For the COM nerds out there, N/A is “not applicable” and not the other NA that you’re thinking of. 😊 Recall that there are two primary apartment models. A process has at most one multi-threaded apartment (MTA) and may have any number of single-threaded apartments (STA). Apartments are an unfortunate reality designed to accommodate COM’s remoting architecture but traditionally used to support COM objects that were not thread safe.

The single-threaded apartment (STA) is used by COM to ensure that objects are only ever called from the thread on which they were created. Naturally, this implies some mechanism to marshal calls from other threads back on to the apartment thread. STAs typically use a message loop or dispatcher queue for this. The multi-threaded apartment (MTA) is used by COM to indicate that no thread affinity exists but that marshaling is required if a call originates in some other apartment. The relationship between objects and threads is complex, so I’ll save that for another day. The ideal scenario is when an object proclaims that it is agile and thus free from apartment affinity.

Let’s use the print_context function to write a few interesting programs. Here’s one that calls print_context before and after using resume_background, to move work on to a background (thread pool) thread.

IAsyncAction Async()
{
    print_context();
    co_await resume_background();
    print_context();
}

Consider also the following caller:

int main()
{
    init_apartment();
    Async().get();
}

The caller is important because it determines the original or calling context for the coroutine (or any function). In this case, init_apartment is called from main without any arguments. This means that the app’s primary thread will join the MTA. It thus does not require a message loop or dispatcher of any kind. It also means that the thread can happily block execution as I have done here by using the blocking get function to wait for the coroutine to complete. Since the calling thread is an MTA thread, the coroutine begins execution on the MTA. The resume_background co_await expression is used to suspend the coroutine momentarily so that the calling thread is released back to the caller and the coroutine itself is free to resume as soon as a thread is available from the thread pool so that the coroutine may continue to execute concurrently. Once on the thread pool, otherwise known as a background thread, print_context is once again called. Here’s what you might see if you were to run this program:

thread:18924 apartment:MTA
thread:9568 apartment:MTA

The thread identifiers don’t matter. What matters is that they are unique. Notice also that the thread temporarily provided by the thread pool was also an MTA thread. How can this be since we did not call init_apartment on that thread? If you step into the print_context function you will notice that the APTTYPEQUALIFIER distinguishes between these threads and identifies the thread association as being implicit. Again, I’ll leave a deeper discussion of apartments for another day. Suffice to say that you can safely assume that a thread pool thread is an MTA thread in practice, provided the process is keeping the MTA alive by some other means.

The key is that the resume_background co_await expression will effectively switch the execution context of a coroutine to the thread pool regardless of what thread or apartment it was originally executing on. A coroutine should presume that it must not block the caller and ensure that evaluation of a coroutine is suspended prior to some compute-bound operation potentially blocks the calling thread. That does not mean that resume_background should always be used. A coroutine might exist purely to aggregate some other set of coroutines. In that case, any co_await expression within the coroutine may provide the necessary suspension to ensure that the calling thread is not blocked.

Now consider what happens if we change the caller, the app’s main function as follows:

int main()
{
    init_apartment(apartment_type::single_threaded);

    Async();

    MSG message;

    while (GetMessage(&message, nullptr, 0, 0))
    {
        DispatchMessage(&message);
    }
}

This seems reasonable enough. The primary thread becomes an STA thread, calls the Async function (without blocking in any way), and enters a message loop to ensure that the STA can service cross-apartment calls. The problem is that the MTA has not been created, so the threads owned by the thread pool will not join the MTA implicitly and can thus not make use of COM services such as activation. Here’s what you might see if you were to run the program now:

thread:17552 apartment:STA
thread:19300 apartment:N/A

Notice that following the resume_background suspension, the coroutine no longer has an apartment context in which to execute code that relies on the COM runtime. If you really need an STA for your primary thread, this problem is easily solved by ensuring that the MTA is “always on” regardless.

int main()
{
    CO_MTA_USAGE_COOKIE mta{};
    check_hresult(CoIncrementMTAUsage(&mta));
    init_apartment(apartment_type::single_threaded);

    Async();

    MSG message;

    while (GetMessage(&message, nullptr, 0, 0))
    {
        DispatchMessage(&message);
    }
}

The CoIncrementMTAUsage function will ensure that the MTA is created. The calling thread becomes an implicit member of the MTA until or unless an explicit choice is made to join some other apartment, as is the case here. If I were to run the program again I get the desired result:

thread:11276 apartment:STA
thread:9412 apartment:MTA

This is essentially the environment in which most modern Windows apps find themselves, but now you know a bit more about how you can control or create that environment for yourself. Join me next time as we continue to explore coroutines.

C++/WinRT: Hosting the Windows Runtime

Previous: Understanding Weak References and the Dispose Pattern

Most developers will use Windows Runtime APIs from a hosting environment already set up to support those APIs. Whether you are authoring a Windows Runtime API or simply making use of various APIs from an app, chances are you can simply start writing code and not think too much about how it all comes to life. We are also working on making it even more seamless by essentially having an “always on” mode where the Windows Runtime is as ubiquitous as the CRT is for the C++ developer.

So, what do I mean by a hosting environment? It is not as mysterious as it sounds. WinRT is just an extension of COM and relies on COM, specifically the runtime services offered by combase.dll, to implement and support essential services like activation and marshaling. Ideally, most APIs will be agile and thus not affected by marshaling. On the other hand, activation is essential and must occur within the context of a COM apartment.

Traditionally, a developer would call CoInitializeEx to associate the calling thread with some apartment. While the apartment is important, calling CoInitialize also ensures that the COM runtime is active within the process, making it possible to retrieve a given WinRT activation factory. CoInitializeEx is by no means the only way to ensure that the COM runtime is loaded. You can for example call CoIncrementMTAUsage to achieve the same end. The resulting relationship between thread and apartment will be different, but you will still be able to access WinRT types. WinRT also introduced a variant of CoInitializeEx called RoInitialize. The differences between these three functions (and others) is not insignificant, but not important for today’s topic.

As I mentioned, most developers will not have to think too much about hosting and in the future, it should be largely ubiquitous. Until then, C++/WinRT provides a few helper functions that you may need to use to control your environment. The first is init_apartment and you have probably seen it in a variety of console and desktop samples:

int main()
{
    init_apartment();

    // Use C++/WinRT here!
}

The init_apartment function wraps RoInitialize and will, by default, cause the calling thread to join the multi-threaded apartment. A console apps will typically be satisfied with just this one call. A desktop app may need a single-threaded apartment:

int __stdcall wWinMain(HINSTANCE, HINSTANCE, LPWSTR, int)
{
    init_apartment(apartment_type::single_threaded);

    // Create window and use C++/WinRT here!

    MSG message;

    while (GetMessage(&message, nullptr, 0, 0))
    {
        DispatchMessage(&message);
    }
}

Keep in mind that you should only call init_apartment on a thread you own. Specifically, do not ever call it from a thread pool callback. If you are writing some helper function, you cannot reliably assume that you can simply call init_apartment. Beyond the two examples already shown, there is rarely a case where you need to call this function yourself. However, if you decide that you do need to call it, then you have the responsibility of taking part in the management of the hosting environment. If you can avoid it, please do so because it can get tricky. That is partly why we are working on ubiquitous runtime support. Until then, there are a few things you should know.

If you call init_apartment you should technically call uninit_apartment, but there is nothing quite like process termination to make sure that everything is cleaned up properly. Therefore, you can generally avoid calling uninit_apartment from an app’s primary thread just before returning from main/WinMain. After all, the CRT is required to terminate the process once main returns.

There are two wrinkles with this plan. The first is that the CRT will call the destructors of any statics before terminating the process. That means that whatever code those destructors execute must be reachable during this short window of time. If you were to call uninit_apartment prior to returning from main/WinMain, this would typically cause the COM runtime to shut down. This in turn causes any DLLs that COM loaded to be abruptly unloaded. However, if any of those statics have outstanding references to objects living inside those DLLs, their destructors will attempt to make virtual calls to Release functions in pages that have already been unloaded. This will cause an access violation and result in undesirable process shutdown crashes.

This of course is only a problem if you have COM statics and Windows developers have known for decades about the perils of COM statics. The other wrinkle is that C++/WinRT caches activation factories to improve performance. The performance gain is substantial, but the challenge is that statics are used for caching. I will say that this problem has already been fixed in RS5 builds of C++/WinRT but remains an issue if you are using the RS4 version of C++/WinRT. Fortunately, C++/WinRT provides another helper function for clearing the cache. So, if you find yourself in the position of having to call uninit_apartment prior to process shutdown then you can call clear_factory_cache first to ensure that those references are released before COM decides to start unloading DLLs.

int main()
{
    init_apartment();

    {
        // Use C++/WinRT here!
    }

    clear_factory_cache();
    uninit_apartment();
}

That is the reason why Raymond Chen calls clear_factory_cache in this example.