How async and future::~future shaped modern concurrency in C++ #
One of the ways to perform parallel tasks is using the standard library function std::async. This was introduced in the C++11 standard, alongside multiple concurrency features as a high level interface. If you are alredy familiar with this function I suggest you skip the brief introduction.
1) Introduction to async #
According to cppreference.com:
The function template std::async runs the function f asynchronously (potentially in a separate thread which might be part of a thread pool).
The reason why possibility is expressed, it’s because unless you provide a launch flag it’s up to the function to launch a new thread of execution.
//...
auto f1 = std::async(std::launch::async, foo); //Will be executed on a different thread
auto f2 = std::async(foo); //No guarantee
//...
There is another launch flag for async and some other details, but there are not relevant to the story. The interesting part, that was what made me want to write this article, are those f1 and f2 variables. These are declared with the keyword auto, but what are they under the hood?
async returns the results of the function they are executing with std::future. Take a look at the following snippet of code. We launch a thread that will execute foo(), wait for the int, and retrieve the value to print it.
std::future<int> f1 = std::async(std::launch::async, foo);
f1.wait();
std::cout << "Done! Result: " << f1.get() << std::endl;
Actually, the get() method is already blocking, so we can skip it.
auto f1 = std::async(std::launch::async, foo);
std::cout << "Done! Result: " << f1.get() << std::endl;
The two examples above aren’t really very good. We are sending foo() to another thread, while we do nothing. A better way of taking advantaje of parallelism, (of course, if we wanted to, concurrency is not always about performance) would be like this:
std::future<int> f1 = std::async(std::launch::async, foo);
int num = foo2();
std::cout << "Adding both results: " << num + f1.get() << std::endl;
Note that we are not taking into account exceptions, or other behaviours that would cause the code to fail.
This is much better, we execute both functions at the same time. Still, if by the time we try to add the two values, foo() is still executing we are going to block our thread until it’s finished. One of the drawbacks is that you have to time when the results will be ready for using them, if you fail to do so, you will block.
2) The issue #
Now that you have some basic knowledge of how they work we can get into the interesting part. Take a look at the following snippet of code:
//...
std::async(std::launch::async, foo1); // #1
std::async(std::launch::async, foo2); // #2
//...
How do you think this code will be run? From what we have learnt above, these two functions should be launched on separate threads, and per se, executed at the same time. In reality this code will always be executed synchronously. Why? The reason is very simple and it has to do with the following:
- We are not saving the return value anywhere (the future).
- async demands that future::~future blocks.
I think the first one is easy to understand, in the previous code we are ignoring the return. But what does the second one mean? And how is it related to not saving the value?
That statement alone is simple, when the destructor of future is called, it has to wait (while blocking) for the function to finish and then proceed to destroy it. What happens when we don’t save the future? That the destructor is called right away, this blocks the thread.
In our example the intended use was to execute concurrently foo1 and foo2. Instead, foo1 will start executing in a new thread, but before even leaving line #1, it realizes that the destructor needs to be called, thus it will wait there (blocking the current thread) till it has finished, to destroy the object. The same will happen at line #2.
Even if we saved the second return, we would still be doomed to not executing concurrently.
std::async(std::launch::async, foo1);
auto f2 = std::async(std::launch::async, foo2);
// Executed sequentially, because we have already had to wait for foo1 to execute
std::async(std::launch::async, foo1);
auto f2 = std::async(std::launch::async, foo2);
auto f3 = std::async(std::launch::async, foo3);
// Only f2 and f3 are executed concurrently, before that they have had to wait for foo1
The solution is simple, always save the variable. Like that the destructor will be called when the variable goes out of scope, and not between these calls. On modern compilers you will even get a warning because it’s marked with [[nodiscard]].
The funny part is that this is just for async. From cppreference: “Note that the destructors of std::futures obtained by means other than a call to std::async never block.” Why just with this function?
Well it’s not like we have discovered anything, this issue has been pointed by various WG21 (the C++ committee) members and discussed at meetings.
This goes back to 2008, when C++11, or C++0x at the time, was still being drafted. Threads were going to be introduced, as with any new feature, decisions have to be taken about how they are going to be implemented, what the standard guarantees and what not, along many other things.
One of the decisions that was taken, was to define the destructor of a thread like this:
If joinable() then detach(), otherwise no effects. [Note: Destroying a joinable thread can be unsafe if the thread accesses objects or the standard library unless the thread performs explicit synchronization to ensure that it does not access the objects or the standard library past their respective lifetimes. Terminating the process with _exit or quick_exit removes some of these obligations. - end note]
Detaching just means separating the thread of execution from the object. This is dangerous for obvious reasons, the thread can outlive some parts of the code (if not all), thus possibly having access to dangling pointers and all the possible consequences of that.
This was brought up by Hans Boehm in 2008 with N2802: A plea to reconsider detach-on-destruction for thread objects. He comments about how thread is not exception safe, and how even with a call to join at the end, an exception would cause the detach.
//...
std::thread t1(foo);
func_that_throws();
t1.join();
//...
With the initial approach the thread would be detached, since it would never reach the join() because of the thrown exception. He proposed various solutions like: a guard to join automatically (similar to the ones for std::mutex), making that threads only have access to permanent objects or replacing detach() with a call to terminate. This last one being the most sensible decision.
The proposal was accepted by the committee and made it to C++11, and it’s how threads act to this day.
Remember that async is just a higher level, more abstract and easy to use type of concurrent task, that is wrapping a thread. But, instead of terminate() if joinable(), they went with the join() approach.
If you read Hans Boehm’s defense for this in N3679: Async() future destructors must wait it makes sense. Detaching a thread is a dangerous move, so it should be explicitly done by the programmer, if not, waiting should be the default. And obviously, a supposedly friendly interface like async, should have that default behaviour.
The dilemma comes from another place. Herb Sutter complained about it in 2013, N3630 async, ~future, and ~thread (Revision 1), that this shouldn’t be the deafult behaviour. Why? Because there is no way to know if a future is coming from async or other function.
Imagine you make a call to a function that belongs to a library, like this:
{
std::future<int> f1 = do_some_work();
}
Is this code going to block? Well it depends, was it launched with async or thread? You just don’t know. What if you have a GUI, you can’t rely on not knowing if the call will block or not, you must remain responsive. The only possible way to know is looking at the source code of the library where the function is implemented.
{
std::future<int> f1 = do_heavy_work();
//... more code
if(user_cancels()){
return;
}
}
The above code, lets say you launch a task that might take a lot of resources and time. You try to call it beforehand, because you think that there is a high chance that you will need it. If it ends up not happening and the implementation is done with async, you will block at the return.
People like Herb Sutter were defending the composability. Saying that future should be a non blocking composable type, but async was poisoning it. All this was discussed in a meeting in Santa Clara, N3709: Minutes for July 2013 Santa Clara SG1 Meeting.
SG1 is the group of the C++ committee that focuses on concurrency
Composable programming is a software design approach that treats code as modular, independent, and reusable components that can be assembled in various combinations to build complex systems. It emphasizes small, stateless, or self-contained pieces that communicate via APIs, allowing for high flexibility, scalability, and easy replacement of functionality. -Wikipedia
You can read the transcript of the meeting, I will quote some interesting parts.
Herb: there are two issues. Right now future is this type where i don’t know what it will do; defeats composability. One solution to that from our proposal was to have separate future & waiting_future types. What about std::async? that’s much of a lesser problem bc you can deprecate it. Seems like there’s an opportunity bc there’s low usage
Niklas: we can switch the names around. To the extent that libraries are using code, code will break no matter what. There doesn’t seem to be a clean solution that avoids breaking existing code.
Chandler: everyone intends std::future to be a non-blocking future. I’m happy with deprecating std::async and including waiting_future if that’s what people want.
Herb: I just care that ~future doesn’t block. It says that it doesn’t unless it comes from std::async.
Jeffrey: in the long run i would like to deprecate std::async. A bunch of functions are written assuming that ~future doesn’t block. Some instances violate that precondition. It would be nice to have a way to check that precondition. But there’s plenty of functions that have preconditions but don’t check them.
Chandler: we really can’t change the return type without forcing every library vendor to break their ABI. Not willing to do that when one library vendor says no.
More or less, everyone accepted that the problem existed, and they were discussing possible solutions. But, as you might have read is not that easy changing the standard library. Some proposals were made, like creating waiting_future, shared_waiting_future and being these the ones that block. But that would require to change the returning type of async to a waiting one. Meaning that the code below now wouldn’t compile, imagine all the people that wrote this using C++11.
std::future<int> f1 = std::async(...);
//It would have to be rewritten to this:
//std::waiting_future<int> f1 = std::async(...);
And as Chandler says, is a very big thing to force. Breaking the API wasn’t in the interest of the committee. Deprecating it seemed like an option, but without any good proposal they would be removing a tool without offering a new one.
They realized that they hit a wall, they weren’t able to provide a good enough solution that did little damage. So, it had to stay, it was kept there because of a retrocompatibility issue.
Instead of fixing it, the plan was to offer new tools, in the hope of async becoming soft deprecated with the time. No changes were made related to this for C++14 or C++17, but with C++20 we got a new type of threads, jthread, that join() instead of terminating. stop_token and co_await were also added, they provided a way to destroy and suspend tasks.
Some other features were added but the most important one is coming now in C++26. The execution control library. This is not any regular feature, it’s a whole framework for managing asynchronous tasks in a composable way. It seems like Herb’s desire finally materialized! Quoting him:
std::execution is what I call “C++’s async model”
It’s a very new feature, still lacks a lot of documentation and I still haven’t tried it myself. It uses Senders and Receivers allowing to separate what to execute from where to execute it. They also allow the concatenation of tasks with the | operator, something new for concurrency in C++. All integrated with the previously added stop_token to avoid freezes.