New in C++/WinRT: Mastering Strong and Weak References

Today I’d like to share another feature of C++/WinRT available in build 17709 of the Windows SDK that really helps to build more complex systems simply and correctly. Distinguishing between strong and weak references is often a necessity in reference-counted systems like WinRT. Knowing how to manage those references correctly can mean the difference between a reliable system that runs smoothly and one that crashes unpredictably. C++/WinRT makes this very simple with some helper functions with deep support in the projection.

While I have already talked about how to work with implementations and understand weak references, reference cycles can wreak havoc on well-intentioned designs and until recently there hasn’t been a simple way to deal with that when it comes to your own implementations. Of course, using neither strong nor weak references is just as problematic. Consider this standalone example:

struct App : implements<App, IInspectable>
{
    hstring m_value{ L"Hello world" };

    IAsyncOperation<hstring> Async()
    {
        co_await 5s;
        co_return m_value;
    }
};

It seems simple enough. The App class has an Async method that eventually returns the value. Now consider this main function:

int main()
{
    init_apartment();
    auto app = make_self<App>();

    auto async = app->Async();

    auto result = async.get();
    printf("%ls\n", result.c_str());
}

Does this seem reasonable? Here’s what happens:

1. The app is created.
2. The async object is created (pointing to the app).
3. The get function blocks for a few seconds and then returns the result.
4. No problems.

But what if the app object is destroyed before the async operation completes? Consider this one-line change:

int main()
{
    init_apartment();
    auto app = make_self<App>();

    auto async = app->Async();
    app = nullptr; // <-- oops

    auto result = async.get(); // <-- boom!
    printf("%ls\n", result.c_str());
}

That should be harmless right? After all, the app object is not referred to after that point. Oh, but it is. The async operation attempts to copy the value stored inside the app (via its implicit this pointer). After all, the coroutine is a member function and considers it the current object that it naturally has access to. Here’s what happens now:

1. The app is created.
2. The async object is created (pointing to the app).
3. The app is destroyed.
4. The get function blocks for a few seconds and then… BOOM!

As soon as it attempts to access the variable inside the app object it will crash or do something entirely undefined. The solution is to give the async operation – the coroutine – its own strong reference to the app object. As it stands, the coroutine effectively holds a raw this pointer to the app object. That is not enough to keep the app object alive. The App class may be updated as follows:

struct App : implements<App, IInspectable>
{
    hstring m_value{ L"Hello world" };

    IAsyncOperation<hstring> Async()
    {
        auto strong = get_strong(); // <-- keep alive

        co_await 5s;
        co_return m_value;
    }
};

Now everything works as expected. All outstanding references to the app may disappear but the coroutine ensures that its dependencies are stable. Of course, a strong reference may not always be desired. A weak reference is also possible as follows:

struct App : implements<App, IInspectable>
{
    hstring m_value{ L"Hello world" };

    IAsyncOperation<hstring> Async()
    {
        auto weak = get_weak(); // <-- maybe keep alive

        co_await 5s;

        if (auto strong = weak.get())
        {
            co_return m_value;
        }
        else
        {
            co_return L"";
        }
    }
};

In this case, the coroutine holds a weak reference that will not keep the app from being destroyed if no other strong references remain. The coroutine must then check whether a strong reference can be acquired before using the member variable.

Of course, lifetime issues are not limited to coroutines or concurrency. You can land up in the same boat using traditional callbacks. Consider another standalone example, this time a hypothetical Window class that sports a single event:

struct Window
{
    event<EventHandler<int>> m_event;

    void PointerPressed(EventHandler<int> const& handler)
    {
        m_event.add(handler);
    }

    void RaisePointerPressed()
    {
        m_event(nullptr, 123);
    }
};

I’m using the Windows::Foundation::EventHandler delegate but the parameters don’t matter. It could just as well be any other delegate. Handlers may be registered, all of which will be called when the event is raised. Now consider a typical app that wants to respond to such events:

struct App : implements<App, IInspectable>
{
    hstring m_value{ L"Hello world" };

    void Register(Window& window)
    {
        window.PointerPressed([&](auto&&...)
        {
            printf("%ls\n", m_value.c_str());
        });
    }
};

In many cases, this is in fact perfectly reasonable. For graphical applications, the App object typically outlives the framework that may raise such events. Still, that’s not necessarily the case and it’s important to know how to deal with that. Consider this main function:

int main()
{
    init_apartment();
    Window window;
    auto app = make_self<App>();

    app->Register(window);

    window.RaisePointerPressed();
}

It seems reasonable enough and you should expect this to work reliably, but consider this change:

int main()
{
    init_apartment();
    Window window;
    auto app = make_self<App>();

    app->Register(window);
    app = nullptr; // <-- oops

    window.RaisePointerPressed(); // <-- boom!
}

Suddenly the event handler explodes. Take a closer look at the handler’s registration:

window.PointerPressed([&](auto&&...)
{
    printf("%ls\n", m_value.c_str());
});

The lambda automatically captures the current object by reference. It’s as if you’d written this:

window.PointerPressed([this](auto&&...)
{
    printf("%ls\n", m_value.c_str());
});

Of course, that capture is just a raw pointer and knows nothing of reference-counting. We can capture a strong reference as follows:

window.PointerPressed([this, strong = get_strong()](auto&&...)
{
    printf("%ls\n", m_value.c_str());
});

And you might even want to exclude the automatic capture of the current object as follows:

window.PointerPressed([strong = get_strong()](auto&&...)
{
    printf("%ls\n", strong->m_value.c_str());
});

Notice that the variable is now accessed through the strong capture variable. Alternatively, you can also capture a weak reference:

window.PointerPressed([weak = get_weak()](auto&&...)
{
    if (auto strong = weak.get())
    {
        printf("%ls\n", strong->m_value.c_str());
    }
});

Some folks prefer member functions over lambdas. That works just as well, but of course the syntax for member functions is slightly different. Here’s the potentially dangerous member function handler using a raw object pointer:

struct App : implements<App, IInspectable>
{
    hstring m_value{ L"Hello world" };

    void Register(Window& window)
    {
        window.PointerPressed({ this, &App::Handler });
    }

    void Handler(IInspectable const&, int)
    {
        printf("%ls\n", m_value.c_str());
    }
};

This is just the standard or conventional way to refer to an object and a corresponding member function. Of course, there’s no use calling get_strong or get_weak from within the handler as we did previously with the coroutine. It may well be too late, as the app object may already have been destroyed by the time the event is raised and the handler is caller. Instead, the choice of weak or strong reference must be established at the point at which the handler is registered, and the app object is still known to be alive. Fortunately, the get_strong and get_weak functions can also be applied here. Consider again the event registration:

window.PointerPressed({ this, &App::Handler });

We can use the get_strong function in place of the raw this pointer as follows:

window.PointerPressed({ get_strong(), &App::Handler });

C++/WinRT ensures that the resulting delegate will hold a strong reference to the current object so that the handler can access any member variables without any concern. Similarly, the get_weak function may be used as follows:

window.PointerPressed({ get_weak(), &App::Handler });

In this case, C++/WinRT ensures that the resulting delegate holds a weak reference. This delegate will internally attempt to resolve it to a strong reference at the last minute and will only call the member function if a strong reference is acquired.

And that’s all I have for today. I trust you will find that C++/WinRT provides a great deal of flexibility when dealing with more complex lifetime management scenarios.

New in C++/WinRT: Async Cancellation Callback for Coroutines

Today I’d like to share another new feature in build 17709 of the Windows SDK that happens to be the most frequently requested feature for C++/WinRT’s coroutine support and that is the addition of a cancellation callback. WinRT has a very specific async pattern that provides cancellation support, but that support does not automatically flow to other async objects, as it does with some other async frameworks and libraries. The cancellation callback gives you an efficient mechanism to make this work and integrate or emulate these other async models.

Previously, cancellation within a coroutine was supported in two ways. The first is explicit:

IAsyncAction OneAsync()
{
    auto token = co_await get_cancellation_token();

    while (!token())
    {
        printf("Do some work for 1 second\n");
        co_await 1s;
    }
}

Waiting on the get_cancellation_token function returns a cancellation token with knowledge of the IAsyncAction that the coroutine is producing on your behalf. You can use a function call to query the cancellation state, essentially polling for cancellation. This makes sense if you are performing some compute-bound operation.

The second option is implicit:

IAsyncAction TwoAsync()
{
    while (true)
    {
        printf("Do some work for 1 second\n");
        co_await 1s;
    }
}

Given a co_await expression, the coroutine checks whether it has been cancelled prior to suspension and will short-circuit out of there rather than suspending. This works great, but if suspension occurs prior to cancellation then the coroutine will not actually come to an end until the nested co_await expression completes and the outer coroutine subsequently returns or hits a subsequent co_await expression.

These options also made it rather difficult to integrate with existing concurrency libraries as there was no preemptive hook by which cancellation might be propagated. No longer! There is now a third option that allows you to register a cancellation callback. Imagine you have a nested coroutine that does the actual work. Let’s use TwoAsync above for that. You can now write another coroutine that wraps it up and forwards cancellation preemptively as follows:

IAsyncAction ThirdAsync()
{
    auto token = co_await get_cancellation_token();
    auto nested = TwoAsync();

    token.callback([&]
    {
        nested.Cancel();
    });

    co_await nested;
}

Notice that the coroutine registers a lambda as the callback and then simply suspends and waits for the nested action to complete. There’s no need to poll for cancellation and the cancellation isn’t blocked indefinitely. Yay! Naturally, you can use this to interop with other coroutine or concurrency libraries that know nothing of C++/WinRT.

And that’s all for today. Stay tuned for more about the next major update to C++/WinRT and do give it a try today.

New Features and Changes Coming to C++/WinRT: Header Isolation

Build 17709 of the Windows SDK (targeting RS5) includes many new features and improvements as well as a few breaking changes that I’d like to share with you today. This is the first major update to C++/WinRT since it officially launched in build 17134 of Windows (aka RS4).

Over the next few days I’m going to share more details about these improvements. Today I’d like to introduce header isolation.

C++/WinRT no longer depends on headers from the Windows SDK to compile. This is in line with CRT and STL headers that do not include any Windows headers to improve standards compliance and avoid inadvertent dependencies. It also dramatically reduces the number of macros that a C++ developer must guard against. Removing the dependency on the Windows headers means that C++/WinRT is more portable and standards compliant and furthers our efforts to make it a cross-compiler and cross-platform library. It also means that the C++/WinRT headers will never be mangled by macros. If you previously relied on C++/WinRT to include various Windows headers, you will now have to include them yourself. It has always been good practice to include any headers you depend on explicitly and not rely on another library to include them for you.

Consider the following example using build 17134 of the Windows SDK:

#include <winrt/Windows.Data.Json.h>
using namespace winrt::Windows::Data::Json;

void sample()
{
    auto json = JsonValue::Parse(LR"({ "Key": "Value" })");
    auto object = json.GetObjectW();
    printf("%ls\n", object.GetNamedString(L"Key").c_str());
}

Notice that the JsonValue’s GetObject method has a strange trailing W character in its name. The method really is named GetObject in the C++/WinRT headers. Unfortunately, those headers included restrictederrorinfo.h from the Windows SDK, which includes windows.h, which includes wingdi.h and that header defines a macro named GetObject. That’s just one example. There are just too many such examples and some are far more problematic.

As of build 17709, this is no longer a problem because the C++/WinRT headers no longer include those Windows headers. You can of course run into the same issue if you manually include windows.h but then at least it’s under your control and you can manage that yourself.

There are still a few headers from the Windows SDK that are included for intrinsics and numerics, but they should not introduce any problems. These dependencies will also likely disappear in the long run as we continue to push portability further.

On the other hand, you might rely on interop with the Windows headers for your particular project. Since the C++/WinRT headers no longer include Windows headers, this interop is disabled by default. If for example you need to implement a COM interface, rooted in ::IUnknown then you can easily re-enable this interop by first including unknwn.h before any C++/WinRT headers. In that way, the C++/WinRT base library will enable various hooks to ensure that this will continue to work. Here’s what that might look like:

#include <unknwn.h>
#include <winrt/Windows.Foundation.h>

struct __declspec(uuid("22155cae-e0e2-49eb-8d20-daac1abda34d")) IClassic : ::IUnknown
{
    virtual HRESULT __stdcall Call() = 0;
};

using namespace winrt;
using namespace Windows::Foundation;

struct Sample : implements<Sample, IStringable, IClassic>
{
    hstring ToString()
    {
        return L"Sample";
    }

    HRESULT __stdcall Call() override
    {
        return S_OK;
    }
};

And that’s all for today. Stay tuned for more about the next major update to C++/WinRT!

C++/WinRT and Beyond

I have been hard at work on C++/WinRT for the RS5 release. There should be an update available with the upcoming Windows SDK for Windows Insider builds very soon. This will include most of the changes coming in RS5 and I will begin to share more about that as soon as it is available.

Beyond RS5, I am starting a new project that builds on C++/WinRT and I cannot wait to share more about that.

Please check back again soon as I will be sharing a lot more information here on kennykerr.ca regularly. I have also rather unexpectedly left Twitter. I may share that tale of woe someday, but feel free to reach out via kenny@kennykerr.ca or via the comments.

If you have specific questions about C++/WinRT, please use the Developer Community.

C++/WinRT: Creating Collections Simply and Efficiently

Previous: Coroutines and the Calling Context

A question recently came up on an internal alias about how to create a WinRT collection, specifically an IVectorView, to represent some allocation of floating point values. Here’s my response. It illustrates how C++/WinRT helps you to create collections very efficiently and with very little effort. WinRT collections are rather complicated internally. C++/WinRT takes all of that complexity out of your hands, saving you a lot of time and effort.

I highly recommend using a std::vector for your data storage. You can then create an IVectorView quite simply:

std::vector<float> values{ 0.1f, 0.2f, 0.3f };
IVectorView<float> view = single_threaded_vector(std::move(values)).GetView();

for (auto&& value : view)
{
    printf("%.2f\n", value);
}

This is very efficient and avoids copies. If you need complete flexibility, you can implement IVectorView and IIterable yourself:

struct Sample : 
    implements<Sample, IVectorView<float>, IIterable<float>>
{
    float GetAt(uint32_t const index);
    uint32_t Size();
    bool IndexOf(float value, uint32_t& index);
    uint32_t GetMany(uint32_t startIndex, array_view<float> values) const;

    IIterator<float> First() const;
};

IVectorView<float> view = make<Sample>();

If you need a bit of help you can use the collection base classes (in RS5) to implement those interfaces:

struct Sample : 
    implements<Sample, IVectorView<float>, IIterable<float>>,
    vector_view_base<Sample, float>
{
    auto& get_container() const noexcept
    {
        return m_values;
    }

    std::vector<float> m_values{ 0.1f, 0.2f, 0.3f };
};

You can also return a custom container if you really don’t want to use std::vector:

struct Sample : 
    implements<Sample, IVectorView<float>, IIterable<float>>,
    vector_view_base<Sample, float>
{
    auto get_container() const noexcept
    {
        struct container
        {
            float const* const first;
            float const* const last;

            auto begin() const noexcept
            {
                return first;
            }

            auto end() const noexcept
            {
                return last;
            }
        };

        return container{ m_values.data(), m_values.data() + m_values.size() };
    }

    std::array<float, 3> m_values{ 0.2f, 0.3f, 0.4f };
};

Additional collection base classes exist for all of the generic collections in the Windows Runtime. There are also a number of cool features that the collection base classes offer for customizing the implementation and I’ll explore those in a future article. You can also learn more about this in our recent talk at Build 2018 where Brent and I introduced many of the productivity improvements in the version of C++/WinRT that ships with the Windows SDK as well as the Visual Studio extension for C++/WinRT that makes it a lot easier to get started with a new project.

C++/WinRT: Coroutines and the Calling Context

Previous: Hosting the Windows Runtime

Let’s now return to the topic coroutines and the issue of execution or calling context. Since coroutines can be used to introduce concurrency or deal with latency in other APIs, some confusion may arise as to the execution context of a give coroutine at any given point in time. Let’s clear a few things up.

Here’s a simple function that will print out some basic information about the calling thread:

void print_context()
{
    printf("thread:%d apartment:", GetCurrentThreadId());

    APTTYPE type;
    APTTYPEQUALIFIER qualifier;
    HRESULT const result = CoGetApartmentType(&type, &qualifier);

    if (result == S_OK)
    {
        puts(type == APTTYPE_MTA ? "MTA" : "STA");
    }
    else
    {
        puts("N/A");
    }
}

That’s by no means an exhaustive or foolproof dump of apartment information, but it is good enough for our purposes today. For the COM nerds out there, N/A is “not applicable” and not the other NA that you’re thinking of. 😊 Recall that there are two primary apartment models. A process has at most one multi-threaded apartment (MTA) and may have any number of single-threaded apartments (STA). Apartments are an unfortunate reality designed to accommodate COM’s remoting architecture but traditionally used to support COM objects that were not thread safe.

The single-threaded apartment (STA) is used by COM to ensure that objects are only ever called from the thread on which they were created. Naturally, this implies some mechanism to marshal calls from other threads back on to the apartment thread. STAs typically use a message loop or dispatcher queue for this. The multi-threaded apartment (MTA) is used by COM to indicate that no thread affinity exists but that marshaling is required if a call originates in some other apartment. The relationship between objects and threads is complex, so I’ll save that for another day. The ideal scenario is when an object proclaims that it is agile and thus free from apartment affinity.

Let’s use the print_context function to write a few interesting programs. Here’s one that calls print_context before and after using resume_background, to move work on to a background (thread pool) thread.

IAsyncAction Async()
{
    print_context();
    co_await resume_background();
    print_context();
}

Consider also the following caller:

int main()
{
    init_apartment();
    Async().get();
}

The caller is important because it determines the original or calling context for the coroutine (or any function). In this case, init_apartment is called from main without any arguments. This means that the app’s primary thread will join the MTA. It thus does not require a message loop or dispatcher of any kind. It also means that the thread can happily block execution as I have done here by using the blocking get function to wait for the coroutine to complete. Since the calling thread is an MTA thread, the coroutine begins execution on the MTA. The resume_background co_await expression is used to suspend the coroutine momentarily so that the calling thread is released back to the caller and the coroutine itself is free to resume as soon as a thread is available from the thread pool so that the coroutine may continue to execute concurrently. Once on the thread pool, otherwise known as a background thread, print_context is once again called. Here’s what you might see if you were to run this program:

thread:18924 apartment:MTA
thread:9568 apartment:MTA

The thread identifiers don’t matter. What matters is that they are unique. Notice also that the thread temporarily provided by the thread pool was also an MTA thread. How can this be since we did not call init_apartment on that thread? If you step into the print_context function you will notice that the APTTYPEQUALIFIER distinguishes between these threads and identifies the thread association as being implicit. Again, I’ll leave a deeper discussion of apartments for another day. Suffice to say that you can safely assume that a thread pool thread is an MTA thread in practice, provided the process is keeping the MTA alive by some other means.

The key is that the resume_background co_await expression will effectively switch the execution context of a coroutine to the thread pool regardless of what thread or apartment it was originally executing on. A coroutine should presume that it must not block the caller and ensure that evaluation of a coroutine is suspended prior to some compute-bound operation potentially blocks the calling thread. That does not mean that resume_background should always be used. A coroutine might exist purely to aggregate some other set of coroutines. In that case, any co_await expression within the coroutine may provide the necessary suspension to ensure that the calling thread is not blocked.

Now consider what happens if we change the caller, the app’s main function as follows:

int main()
{
    init_apartment(apartment_type::single_threaded);

    Async();

    MSG message;

    while (GetMessage(&message, nullptr, 0, 0))
    {
        DispatchMessage(&message);
    }
}

This seems reasonable enough. The primary thread becomes an STA thread, calls the Async function (without blocking in any way), and enters a message loop to ensure that the STA can service cross-apartment calls. The problem is that the MTA has not been created, so the threads owned by the thread pool will not join the MTA implicitly and can thus not make use of COM services such as activation. Here’s what you might see if you were to run the program now:

thread:17552 apartment:STA
thread:19300 apartment:N/A

Notice that following the resume_background suspension, the coroutine no longer has an apartment context in which to execute code that relies on the COM runtime. If you really need an STA for your primary thread, this problem is easily solved by ensuring that the MTA is “always on” regardless.

int main()
{
    CO_MTA_USAGE_COOKIE mta{};
    check_hresult(CoIncrementMTAUsage(&mta));
    init_apartment(apartment_type::single_threaded);

    Async();

    MSG message;

    while (GetMessage(&message, nullptr, 0, 0))
    {
        DispatchMessage(&message);
    }
}

The CoIncrementMTAUsage function will ensure that the MTA is created. The calling thread becomes an implicit member of the MTA until or unless an explicit choice is made to join some other apartment, as is the case here. If I were to run the program again I get the desired result:

thread:11276 apartment:STA
thread:9412 apartment:MTA

This is essentially the environment in which most modern Windows apps find themselves, but now you know a bit more about how you can control or create that environment for yourself. Join me next time as we continue to explore coroutines.