Category Archives: Uncategorized

The Old New Thing on C++/WinRT

In case you haven’t noticed, Raymond Chen has joined me in writing about C++/WinRT. He has published a great collection of tips and tricks internally and plans to publish them all publicly as time allows. Here are a few to get you started:

Detecting whether the -opt flag was passed to cppwinrt.exe: Using __has_include

How can I determine in a C++ header file whether C++/CX is enabled? How about C++/WinRT?

Why does my C++/WinRT project get errors of the form “Unresolved external symbol void* __cdecl winrt_make_YourNamespace_YourClass(void)“?

Why does my C++/WinRT project get errors of the form ‘winrt::impl::produce‘: cannot instantiate abstract class, missing method GetBindingConnector

Why does my C++/WinRT project get errors of the form “consume_Something: function that returns ‘auto’ cannot be used before it is defined”?

Why does my C++/WinRT project get errors of the form “unresolved external symbol … consume_Something”?

Windows Runtime delegates and object lifetime in C++/WinRT

Meet C++/WinRT 2.0: resume_foreground Improvements

It turns out that resume_foreground was being a little too clever and could introduce deadlocks in some scenarios because it only suspended if not already on the dispatcher thread. This seemed like a good idea at the time, but being able to depend on stack unwinding and re-queuing turns out to be quite important for system stability especially in OS code. Consider this simple example:

fire_and_forget UpdateAsync(TextBlock block)
{
    co_await resume_background();
    hstring message = L"Hello developers!";

    co_await resume_foreground(block.Dispatcher());
    block.Text(message);
}

Here we’re performing some complex calculation on a background thread and then naturally switch to the appropriate UI thread before updating some UI control. The resume_foreground function had some cleverness that looked something like this:

auto resume_foreground(...) noexcept
{
    struct awaitable
    {
        bool await_ready() const
        {
            return m_dispatcher.HasThreadAccess(); // <-- Cleverness...
        }
        void await_resume() const {}
        void await_suspend(coroutine_handle<> handle) const { ... }
    };
    return awaitable{ ... };
};

This has been updated as follows:

auto resume_foreground(...) noexcept
{
    struct awaitable
    {
        bool await_ready() const
        {
            return false; // <-- Queue without waiting
        }
        void await_resume() const {}
        void await_suspend(coroutine_handle<> handle) const { ... }
    };
    return awaitable{ ... };
};

This is analogous to the difference between SendMessage and PostMessage in classic desktop app development. The latter will queue the work and then unwind the stack without waiting for it to complete. This unwinding the stack can be essential.

The resume_foreground function also initially only supported the CoreDispatcher tied to a CoreWindow that was originally introduced with Windows 8. A more flexible and efficient dispatcher has since been introduced. The DispatcherQueue is nice in that you can create them for your own purposes. Consider a simple console app:

using namespace Windows::System;

fire_and_forget RunAsync(DispatcherQueue queue);

int main()
{
    auto controller = DispatcherQueueController::CreateOnDedicatedThread();
    RunAsync(controller.DispatcherQueue());
    getchar();
}

Here I’m creating a private queue thread and then passing this queue object to the coroutine. The coroutine can then presumably use it to await – suspend and resume on this private thread. Another common use of the DispatcherQueue is to create a queue on the current UI thread for a traditional desktop or Win32 app.

DispatcherQueueController CreateDispatcherQueueController()
{
    DispatcherQueueOptions options
    {
        sizeof(DispatcherQueueOptions),
        DQTYPE_THREAD_CURRENT,
        DQTAT_COM_STA
    };

    ABI::Windows::System::IDispatcherQueueController* ptr{};
    check_hresult(CreateDispatcherQueueController(options, &ptr));
    return { ptr, take_ownership_from_abi };
}

Not only does this illustrate how Win32 functions may be called and incorporated into C++/WinRT projects, by simply calling the Win32-style CreateDispatcherQueueController function to create the controller and then transferring ownership of the resulting queue controller to the caller as a WinRT object, but this is precisely how you can support efficient and seamless queuing on your existing Petzold-style Win32 desktop app:

fire_and_forget RunAsync(DispatcherQueue queue);

int main()
{
    Window window;
    auto controller = CreateDispatcherQueueController();
    RunAsync(controller.DispatcherQueue());
    MSG message;

    while (GetMessage(&message, nullptr, 0, 0))
    {
        DispatchMessage(&message);
    }
}

This simple main function starts by creating a window. You can imagine this registers a window class and calls CreateWindow to create the top-level desktop window. The CreateDispatcherQueueController function is then called to create the queue controller before calling some coroutine with the dispatcher queue owned by this controller. A traditional message pump is then entered where resumption of the coroutine naturally occurs on this thread. Having done so, you can return to the elegant world of coroutines for your async or message based workflow within your app:

fire_and_forget RunAsync(DispatcherQueue queue)
{
    ... // Start on the calling thread

    co_await resume_foreground(queue);

    ... // Resume on the dispatcher thread
}

The resume_foreground will always “queue” and then unwind the stack. You can also optionally set the resumption priority:

fire_and_forget RunAsync(DispatcherQueue queue)
{
    ...

    co_await resume_foreground(queue, DispatcherQueuePriority::High);

    ...
}

But if you only care about default queuing order then you can even await the queue itself and save yourself a few keystrokes:

fire_and_forget RunAsync(DispatcherQueue queue)
{
    ...

    co_await queue;

    ...
}

For the control freaks out there, you can even detect queue shutdown and handle that gracefully:

fire_and_forget RunAsync(DispatcherQueue queue)
{
    ...

    if (co_await queue)
    {
        ... // Resume on dispatcher thread
    }
    else
    {
        ... // Still on calling thread
    }
}

The co_await expression will return true, indicating that resumption will occur on the dispatcher thread. In other words, queuing was successful. Conversely, it will return false to indicate that execution remains on the calling thread because the queue’s controller is shutting down and is no longer serving queue requests.

As you can see, you have a great deal of power at your fingertips when you combine C++/WinRT with coroutines and especially when you do some old-school Petzold style desktop app development.

And that’s all for today. I hope you enjoy using C++/WinRT!

Meet C++/WinRT 2.0: Fewer Dependencies

I’ve always loved tools like Sysinternals where there is a single executable that you can simply copy onto your dev box and run. No need for an installer or a carefully managed tree of DLLs. It just works. Well cppwinrt.exe is like that as well. From the start, you could simply copy it onto any Windows 10 machine and it would just work. Still, there’s always room for improvement. Have a look at the dependencies reported by dumpbin for version 1 of cppwinrt:

> dumpbin /dependents cppwinrt.exe

ADVAPI32.dll
SHELL32.dll
api-ms-win-core-file-l1-1-0.dll
api-ms-win-core-processthreads-l1-1-0.dll
XmlLite.dll
api-ms-win-core-libraryloader-l1-2-0.dll
api-ms-win-core-processenvironment-l1-1-0.dll
RoMetadata.dll
SHLWAPI.dll
KERNEL32.dll
api-ms-win-core-rtlsupport-l1-1-0.dll
api-ms-win-core-heap-l1-1-0.dll
api-ms-win-core-localization-l1-2-0.dll
api-ms-win-core-timezone-l1-1-0.dll
api-ms-win-core-console-l1-1-0.dll
OLEAUT32.dll
api-ms-win-core-winrt-error-l1-1-0.dll
api-ms-win-core-winrt-error-l1-1-1.dll
api-ms-win-core-winrt-l1-1-0.dll
api-ms-win-core-winrt-string-l1-1-0.dll
api-ms-win-core-synch-l1-1-0.dll
api-ms-win-core-threadpool-l1-2-0.dll
api-ms-win-core-com-l1-1-0.dll
api-ms-win-core-com-l1-1-1.dll
api-ms-win-core-synch-l1-2-0.dll

In my defense, that’s not as bad as it looks. All of those DLLs are shipped with Windows 10 and those api-ms-win-core-xxx entries are really forwarding DLLs that support API sets. Still, there was one DLL in that list that caused a bit of trouble. RoMetadata.dll provides the implementation of the metadata parser shipped with the operating system. This is the implementation that practically everyone uses either directly or indirectly. We first hit a snag with this because the rather locked down server SKU that the build engineers at Microsoft wanted to use didn’t include this DLL. That turned out to be a Windows setup bug, but it got me thinking more about dependencies.

With C++/WinRT 2.0 I finally started writing a completely independent metadata parser in standard C++ to avoid this dependency and solve all kinds of trouble with this clunky old parser. A few guys on the team chipped in and this parser is now the foundation for all of our modern tooling. I then also ditched the forwarding DLLs to the point where dumpbin now reports a slightly smaller set of dependencies for version 2 of cppwinrt:

> dumpbin /dependents cppwinrt.exe

KERNEL32.dll
ADVAPI32.dll
XmlLite.dll
SHLWAPI.dll

The fun thing about this is that all of those DLLs are available, not only on Windows 10, but all the way down to Windows 7 and even Windows Vista. That means if you happen to have some crazy old build server running Windows 7, well then you can still run cppwinrt to generate the C++ headers for your project. And if you actually want to run C++/WinRT on Windows 7 you can even do that with a bit of work as well.

And that’s all for today. I hope you enjoy using C++/WinRT!

Meet C++/WinRT 2.0: Async Timeouts Made Easy

C++/WinRT took a big bet on C++ coroutines and that bet has paid off. Coroutines are in C++20 and the effect on writing concurrency code in C++ has been transformational. C++/WinRT was also the primary driver for the adoption of coroutines within Windows. Still, there are times when the fact that some API call is async is completely irrelevant and all you want is the result here and now. For that reason, C++/WinRT’s implementation of the various WinRT async interfaces has always sported a get function, similar to what std::function provides:

int main()
{
    IAsyncAction async = ...
    async.get();
    puts("done!");
}

This get function will block indefinitely for the async object to complete. Async objects tend to be very short-lived so this is often all you need. There are however times when this really doesn’t cut it and you need to abandon the wait after some time has elapsed. Writing this has always been possible, thanks to the building blocks provided by WinRT, but it has never been easy. Well C++/WinRT now makes it trivial by providing a wait_for function, again similar to what std::function provides:

int main()
{
    IAsyncAction async = ...

    if (async.wait_for(5s) == AsyncStatus::Completed)
    {
        puts("done");
    }
}

The wait_for in this example (using std::literals) will wait around 5 seconds before checking completion. If the comparison is favorable then you know that the async object completed successfully and you’re done. If you are waiting for some result, then you can simply follow that with a call to the get function to retrieve the result:

int main()
{
    IAsyncOperation<int> async = ...

    if (async.wait_for(5s) == AsyncStatus::Completed)
    {
        printf("result %d\n", async.get());
    }
}

Since the async object has already completed, the get function will return the result immediately without any further wait. As you can see, the wait_for function returns the state of the async object. You can thus use this for more fine-grained control:

switch (async.wait_for(5s))
{
case AsyncStatus::Completed:
    printf("result %d\n", async.get());
    break;
case AsyncStatus::Canceled:
    puts("canceled");
    break;
case AsyncStatus::Error:
    puts("failed");
    break;
case AsyncStatus::Started:
    puts("still running");
    break;
}

As I mentioned, AsyncStatus::Completed means the async object completed successfully and you may call the get function for any result.

AsyncStatus::Canceled means the async object was canceled. Note that the cancellation is typically requested by the caller, so it would be rare to handle this state. Typically, a cancelled async object is simply discarded.

AsyncStatus::Error means the async object has failed in some way. You may call the get function to rethrow the exception if so desired.

Finally, AsyncStatus::Started means that the async object is still running. This is where it gets tricky. The WinRT async pattern does not allow multiple waits or waiters. That means that you cannot call wait_for in a loop. If the wait has effectively timed-out, you are left with a few choices. You may abandon the object or you may poll its status before calling get to retrieve any result, but it’s best just to discard the object at this point.

And that’s all for today. I hope you enjoy using C++/WinRT!

Meet C++/WinRT 2.0: Optimizing Components

You may have noticed that all the 2.0 entries thus far have been focused on the component author. That’s no coincidence: C++/WinRT 2.0 is very much focused on improving the correctness, efficiency, reliability, and productivity of the developer building a WinRT component. One of the improvements for component developers could not be made without introducing a breaking change, so what I’m describing today is opt-in although it is enabled by default for new projects. An existing project can opt-in using the C++/WinRT compiler’s new -optimize command line option or in Visual Studio by setting the “Optimized” option to true:

First, I’ll describe why this is a cool optimization you should care about and then I’ll talk about how it’s implemented and you’ll understand why this is a breaking change worth applying to existing projects.

-optimize enables what is often called uniform construction. This is a feature that was long requested but eluded me for some time and I am very pleased that we can finally rely on this. Uniform or unified construction is the notion that you can use the C++/WinRT language projection itself to create and use your intra-component types, types that are implemented by your component, without getting into weird loader issues and you can do so efficiently. This solves a few pitfalls that hampered developers building complex components in the past. Imagine you have the following WinRT class (defined in IDL):

namespace Component
{
    runtimeclass Class
    {
        Class();
        void Method();
        static void StaticMethod();
    }
}

Naturally, as a C++ developer familiar with using the C++/WinRT library you might want to use the class as follows:

using namespace winrt::Component;

Class c;
c.Method();
Class::StaticMethod();

And this would be perfectly reasonable, if this code didn’t reside within the same component that implements this class. You see, the thing about C++/WinRT is that as a language projection it shields the developer from the ABI. C++/WinRT never calls directly into the implementation. It always travels through the ABI. Now this is not the ABI that C++ compiler developers talk about. This is the COM-based ABI that WinRT defines. So that first line where you are constructing the Class object actually calls the RoGetActivationFactory function to retrieve the class or activation factory and then uses that factory to create the object. The last line likewise uses the factory to make what appears to be a static method call. Thankfully, C++/WinRT has a blazingly fast factory cache, so this isn’t a problem for apps. The trouble is that within a component you’ve just done something that is a little problematic.

Firstly, no matter how fast the C++/WinRT factory cache is, calling through RoGetActivationFactory or even subsequent calls through the factory cache will always be slower than calling directly into the implementation. A call to RoGetActivationFactory followed by IActivationFactory::ActivateInstance followed by QueryInterface is obviously not going to be as efficient as using a C++ new expression for a locally-defined type. As a consequence, seasoned C++/WinRT developers know to use the make or make_self helper functions when creating objects within a component:

// Class c;
Component::Class c = make<implementation::Class>();

But as you can see, this is not nearly as convenient or concise. Not only must you use a helper function to create the object, you must also disambiguate between the implementation type and the projected type. It’s also easy to forget to do so.

Secondly, using the projection to create the class means that its activation factory will be cached. Normally this is a wonderful thing but if the factory resides in the same DLL that is making the call then you’ve effectively pinned the DLL and prevented it from ever unloading. For many developers this probably doesn’t matter but some system components must support unloading, and this can become rather problematic.

So this is where the term uniform construction comes in. Regardless of whether the code resides in a project that is merely consuming the class or whether the code resides in the project that is actually implementing the class, the developer can freely use the same syntax to create the object:

// Component::Class c = make<implementation::Class>();
Class c;

When the component is built with -optimize, the call through the language projection will compile down to the same efficient call to the make function that directly creates the implementation type and avoid the syntactic complexity, the performance hit of calling through the factory, and the problem of pinning the component in the process.

Uniform construction applies to any call that is served by the factory under the hood. Practically, that means this optimization serves both constructors and statics. Here’s the original example again:

Class c;
c.Method();
Class::StaticMethod();

Without -optimize, the first and last statements require calls through the factory object. With -optimize, neither do and those calls are compiled directly against the implementation and even have the potential of being inlined. This speaks to the other term often used when talking about -optimize, namely direct implementation access. Language projections are nice, but when you can directly access the implementation you can and should take advantage of it to produce the most efficient code possible. Now C++/WinRT will do this for you, without forcing you to leave the safety and productivity of the projection.

So why is this a breaking change? Well, the component must cooperate in order to allow the language projection to reach in and directly access its implementation types. As C++/WinRT is a header-only library, you can peek inside and see what’s going on. Without -optimize, the Class constructor and StaticMethod member are defined by the projection as follows:

namespace winrt::Component
{
    inline Class::Class() :
        Class(impl::call_factory<Class>([](auto&& f) { return f.template ActivateInstance<Class>(); }))
    {
    }
    inline void Class::StaticMethod()
    {
        impl::call_factory<Class, Component::IClassStatics>([&](auto&& f) { return f.StaticMethod(); });
    }
}

You don’t need to understand any of this (and remember never to rely on anything in the impl namespace), but it should be clear that both calls involve a call to some function named “call_factory”. That’s your clue that these calls involve the factory cache and are not directly accessing the implementation. With -optimize, these same functions are not defined at all! Instead, they are declared by the projection and their definitions are left up to the component. The component can then provide definitions that call directly into the implementation. This is where the breaking change comes in. Those definitions are generated for you when you use both -component and -optimize and appear in a file called Type.g.cpp where Type is the name of the WinRT class being implemented. That’s why you may hit various linker errors when you first enable -optimize in an existing project. You need to include that generated file into your implementation to stitch things up. In our example, the Class.h might look like this (regardless of whether -optimize is being used):

// Class.h
#pragma once
#include "Class.g.h"

namespace winrt::Component::implementation
{
    struct Class : ClassT<Class>
    {
        Class() = default;

        static void StaticMethod();
        void Method();
    };
}
namespace winrt::Component::factory_implementation
{
    struct Class : ClassT<Class, implementation::Class>
    {
    };
}

Your Class.cpp is where it all comes together:

#include "pch.h"
#include "Class.h"
#include "Class.g.cpp" // <-- Add this line!

namespace winrt::Component::implementation
{
    void Class::StaticMethod()
    {
    }

    void Class::Method()
    {
    }
}

As you can, following the inclusion (and definition) of the implementation class, Class.g.cpp is included to provide the definitions of those functions that the projection left undefined. Here’s what those definitions look like inside the Class.g.cpp file:

namespace winrt::Component
{
    Class::Class() :
        Class(make<Component::implementation::Class>())
    {
    }
    void Class::StaticMethod()
    {
        return Component::implementation::Class::StaticMethod();
    }
}

So this nicely completes the projection with efficient calls directly into the implementation, avoids those calls to the factory cache, and the linker is satisfied.

The final thing that -optimize does for you is to change the implementation of your project’s module.g.cpp, that helps you to implement your DLL’s DllGetActivationFactory and DllCanUnloadNow exports, in such a way that incremental builds will tend to be much faster by eliminating the strong type coupling that was required by version 1 of C++/WinRT. This is often referred to as type-erased factories. Without -optimize, the module.g.cpp file that is generated for your component starts off by including the definitions of all your implementation classes, the Class.h in this example. It then directly creates the implementation factory for each class as follows:

if (requal(name, L"Component.Class"))
{
    return winrt::detach_abi(winrt::make<winrt::Component::factory_implementation::Class>());
}

Again, you don’t need to understand any of this but it is useful to see that this requires the complete definition for any and all classes implemented by your component. This can have a dramatic effect on your inner loop as any change to a single implementation will cause module.g.cpp to recompile. With -optimize, this is no longer the case. Instead, two things happen to the generated module.g.cpp file. The first is that it no longer includes any implementation classes. In this example, it will not include Class.h at all. Instead, it creates the implementation factories without any knowledge of their implementation:

void* winrt_make_Component_Class();

if (requal(name, L"Component.Class"))
{
    return winrt_make_Component_Class();
}

Obviously, there is no need to include their definitions and its up to the linker to resolve the winrt_make_Component_Class function’s definition. Of course, you don’t need to think about this because the Class.g.cpp file that gets generated for you, and that you previously included to support uniform construction, also defines this function. Here’s the entirety of the Class.g.cpp file that is generated for this example:

void* winrt_make_Component_Class()
{
    return winrt::detach_abi(winrt::make<winrt::Component::factory_implementation::Class>());
}
namespace winrt::Component
{
    Class::Class() :
        Class(make<Component::implementation::Class>())
    {
    }
    void Class::StaticMethod()
    {
        return Component::implementation::Class::StaticMethod();
    }
}

As you can see, the winrt_make_Component_Class function directly creates your implementation’s factory. This all means that you can happily change any given implementation and the module.g.cpp need not be recompiled at all. It is only when you add or remove WinRT classes that the module.g.cpp will be updated and need to be recompiled.

And that’s all for today. Stay tuned for more about C++/WinRT 2.0!

Meet C++/WinRT 2.0: Safe Queries During Destruction

Building on the notion of deferred destruction is the ability to safely query during destruction. COM is based on two central concepts. The first is reference counting and the second is querying for interfaces. IUnknown provides AddRef and Release, which we talked about last time, as well as QueryInterface. This function is heavily used by certain UI frameworks, like Xaml, to traverse the Xaml hierarchy as it simulates its composable type system. Consider a simple example:

struct MainPage : PageT<MainPage>
{
    ~MainPage()
    {
        DataContext(nullptr);
    }
};

This seems harmless, right? This Xaml page wants to clear its data context in its destructor, but DataContext is a property of the FrameworkElement base class and lives on the distinct IFrameworkElement interface. As a result, C++/WinRT must inject a call to QueryInteface to lookup the correct vtable before being able to call the DataContext property. Fortunately, C++/WinRT 2.0 has been hardened to support this. Let’s look at the C++/WinRT implementation of Release (in a slightly simplified form):

uint32_t Release() noexcept
{
    uint32_t const remaining = subtract_reference();

    if (remaining == 0)
    {
        m_references = 1; // Debouncing!
        T::final_release(...);
    }

    return remaining;
}

As you can imagine, it first decrements the reference count and only acts if there are no outstanding references. However before calling the static final_release function I described last time, it stabilizes the reference count by setting it to one. I like to call this debouncing, to borrow a term from electrical engineering. This is critical because once the final reference has been released, the reference count is unstable and unable to reliably support a call to QueryInterface.

Calling QueryInterface is dangerous because the reference count can conceivably grow indefinitely. Care must be taken only to call known code paths that will not prolong the life of the object. That’s up to you, but at least C++/WinRT will ensure that those QueryInterface calls can be made reliably. It does so through reference count stabilization. When the final reference has been released, the actual reference count is either zero or some wildly unpredictable value. The latter may occur if weak references are involved. Either way, this is unsustainable if a subsequent call to QueryInterface occurs because that will necessarily cause the reference count to increment temporarily – hence the reference to debouncing. Setting it to one ensures that a final call to Release will never again occur on this object, which is precisely what we want since the unique_ptr now owns the object, but bounded calls to QueryInterface/Release pairs will be safe. Consider a more interesting example:

struct MainPage : PageT<MainPage>
{
    ~MainPage()
    {
        DataContext(nullptr);
    }
    static fire_and_forget final_release(std::unique_ptr<MainPage> ptr)
    {
        co_await 5s;
        co_await resume_foreground(ptr->Dispatcher());
        ptr = nullptr;
    }
};

First up the final_release function is called, notifying the implementation that it’s time to clean up. This final_release happens to be a coroutine. It first waits on the thread pool for a few seconds – just for fun – before resuming on the page’s dispatcher thread. This involves a query since Dispatcher is a property of the DependencyObject base class. Now the page is finally deleted by virtue of assigning nullptr to the unique_ptr. This in turn calls the page’s destructor. Inside the destructor we clear the data context, which as we know requires a query for the FrameworkElement base class.

All of this possible because of the reference count debouncing or stabilization that is now provided by C++/WinRT 2.0! And that’s all for today. Stay tuned for more.