《effective modern C++》Chapter 6-8

《effective modern C++》Chapter 6-8

[TOC]

本文记录《effective modern C++》第 6-8 章的学习笔记.

CHAPTER 6 Lambda Expressions

lambda 表达式简介

用途:

  1. predicates

    • STL “_if” algorithms (e.g., std::find_if, std::remove_if, std::count_if, etc.)

    • comparison functions (e.g., std::sort, std::nth_element, std::lower_bound, etc.)

    • custom deleters for std::unique_ptr and std::shared_ptr

    • condition variables in the threading API

  2. on-the-fly specification of callback functions

  3. interface adaption functions

  4. context-specific functions for one-off calls

概念区分:

  • A lambda expression is just that: an expression.

    1
    2
    std::find_if(container.begin(), container.end(),
    [](int val) { return 0 < val && val < 10; });
  • A closure is the runtime object created by a lambda.

  • A closure class is a class from which a closure is instantiated. Each lambda causes compilers to generate a unique closure class. The statements inside a lambda become executable instructions in the member functions of its closure class.

A lambda is often used to create a closure that’s used only as an argument to a function.

it’s usually possible to have multiple closures of a closure type corresponding to a single lambda.

1
2
3
4
5
6
7
8
9
{
int x; // x is local variable

auto c1 = // c1 is copy of the
[x](int y) { return x * y > 55; }; // closure produced by the lambda
auto c2 = c1; // c2 is copy of c1
auto c3 = c2; // c3 is copy of c2

}

it’s perfectly acceptable to blur the lines between lambdas, closures, and closure classes. But in the Items that follow, it’s often important to distinguish what exists during compilation (lambdas and closure classes), what exists at runtime (closures), and how they relate to one another.

Item 31: Avoid default capture modes.

default by-reference capture mode

A by-reference capture causes a closure to contain a reference to a local variable or to a parameter that’s available in the scope where the lambda is defined. If the lifetime of a closure created from that lambda exceeds the lifetime of the local variable or parameter, the reference in the closure will dangle.

1
2
3
4
5
6
7
8
9
10
11
12
13
using FilterContainer = std::vector<std::function<bool(int)>>; 
FilterContainer filters; // filtering funcs

//set 5 as a parameter
void addDivisorFilter()
{
auto calc1 = computeSomeValue1();
auto calc2 = computeSomeValue2();
auto divisor = computeDivisor(calc1, calc2);
filters.emplace_back( // danger! ref to divisor will dangle
[&](int value) { return value % divisor == 0; }
);
}

variable ceases to exist when addDivisorFilter returns. That’s immediately after filters.emplace_back returns, so the function that’s added to filters is essentially dead on arrival.

1
2
3
4
filters.emplace_back(
[&divisor](int value) // danger! ref to divisor will still dangle!
{ return value % divisor == 0; }
);

it’s easier to see that the viability of the lambda is dependent on divisor’s lifetime. Also, writing out the name, divisor, reminds us to ensure that divisor lives at least as long as the lambda’s closures.

不使用 default capture 是因为可以提供一个提示, 需要保证显式指定的参数必须要保证 lifetime.

那我十分确定在本次不会出现问题不行吗?

If you know that a closure will be used immediately (e.g., by being passed to an STL algorithm) and won’t be copied, there is no risk that references it holds will outlive the local variables and parameters in the environment where its lambda is created. In that case, you might argue, there’s no risk of dangling references, hence no reason to avoid a default by-reference capture mode.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
template<typename C>
void workWithContainer(const C& container)
{
auto calc1 = computeSomeValue1(); // as above
auto calc2 = computeSomeValue2(); // as above
auto divisor = computeDivisor(calc1, calc2); // as above

using ContElemT = typename C::value_type; // type of elements in container

using std::begin; // for genericity;
using std::end;
//C++14 with auto
//if (std::all_of(begin(container), end(container),
// [&](const auto& value)
// { return value % divisor == 0; }))
if (std::all_of( // if all values in container are multiples of divisor...
begin(container), end(container),
[&](const ContElemT& value)
{ return value % divisor == 0; })
) {
// they are...
} else {
// at least one isn't...
}
}

风险: copy-and-pasted into a context where its closure could outlive divisor, you’d be back in dangle-city.

default by-value capture mode

1
2
3
filters.emplace_back( // now divisor can't dangle
[=](int value) { return value % divisor == 0; }
);

风险点 1: capture a pointer by value

if you capture a pointer by value, you copy the pointer into the closures arising from the lambda, but you don’t prevent code outside the lambda from deleteing the pointer and causing your copies to dangle.

即便你可以使用智能指针代替 raw pointer 解决这个问题, 你也很难保证所有使用该 lambda 的人一定不会用 raw pointer.

风险点 2: 类中用到的 lambda 表达式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Widget {
public:
// ctors, etc.
void addFilter() const; // add an entry to filters
private:
int divisor; // used in Widget's filter
};

void Widget::addFilter() const
{
filters.emplace_back( //the code won’t compile, divisor not available
[=](int value) { return value % divisor == 0; }
);
}

//显式 capture
void Widget::addFilter() const
{
filters.emplace_back(
[divisor](int value) // error! no local divisor to capture
{ return value % divisor == 0; }
);
}

Captures apply only to non-static local variables (including parameters) visible in the scope where the lambda is created.

本质原因是类内捕获的是 this 指针.

1
2
3
4
5
6
7
8
void Widget::addFilter() const
{
auto currentObjectPtr = this;
filters.emplace_back(
[currentObjectPtr](int value)
{ return value % currentObjectPtr->divisor == 0; }
);
}

下面是使用类中 lambda 表达式错误的例子

1
2
3
4
5
6
7
8
9
10
using FilterContainer = // as before
std::vector<std::function<bool(int)>>;
FilterContainer filters; // as before

void doSomeWork()
{
auto pw = std::make_unique<Widget>(); // create Widget;
pw->addFilter(); // add filter that uses Widget::divisor

} // destroy Widget; filters now holds dangling pointer(this)!

This particular problem can be solved by making a local copy of the data member you want to capture and then capturing the copy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
void Widget::addFilter() const
{
auto divisorCopy = divisor; // copy data member
filters.emplace_back(
[divisorCopy](int value) // capture the copy use the copy
{ return value % divisorCopy == 0; }
);
}

//下面也可行
void Widget::addFilter() const
{
auto divisorCopy = divisor; // copy data member
filters.emplace_back(
[=](int value) // capture the copy use the copy
{ return value % divisorCopy == 0; }
);
}

//C++14 可以更简单一些
void Widget::addFilter() const
{
filters.emplace_back( // C++14:
[divisor = divisor](int value) // copy divisor to closure
{ return value % divisor == 0; } // use the copy
);
}

风险点 3: objects with static storage duration

An additional drawback to default by-value captures is that they can suggest that the corresponding closures are self-contained and insulated from changes to data outside the closures.

static storage duration objects are defined at global or namespace scope or are declared static inside classes, functions, or files. These objects can be used inside lambdas, but they can’t be captured. Yet specification of a default by-value capture mode can lend the impression that they are.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
void addDivisorFilter()
{
static auto calc1 = computeSomeValue1(); // now static
static auto calc2 = computeSomeValue2(); // now static
static auto divisor = // now static

computeDivisor(calc1, calc2);

filters.emplace_back(
[=](int value) // captures nothing! refers to above static
{ return value % divisor == 0; }
);
++divisor; // modify divisor
}

Practically speaking, this lambda captures divisor by reference, a direct contradiction to what the default by-value capture clause seems to imply.

从实际效果来说,这个 lambda 式实现的效果是按引用捕获 divisor, 和按值默认捕获所暗示的含义有着直接的矛盾.

Things to Remember

  • Default by-reference capture can lead to dangling references.
  • Default by-value capture is susceptible to dangling pointers (especially this), and it misleadingly suggests that lambdas are self-contained.

Item 32: Use init capture to move objects into closures.

C++ 14 应用 init capture

不像 C++11, C++14 offers direct support for moving objects into closures.

there are ways to approximate move capture in C++11.

Using an init capture makes it possible for you to specify

  1. the name of a data member in the closure class generated from the lambda and
  2. an expression initializing that data member.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class Widget { // some useful type
public:

bool isValidated() const;
bool isProcessed() const;
bool isArchived() const;
private:

};

auto pw = std::make_unique<Widget>(); // create Widget;
// configure *pw
auto func = [pw = std::move(pw)] // init data mbr in closure w std::move(pw)
{ return pw->isValidated()
&& pw->isArchived(); };

pw = std::move(pw): To the left of the = is the name of the data member in the closure class you’re specifying, and to the right is the initializing expression.

closure class’s data member can be directly initialized by std::make_unique:

1
2
auto func = [pw = std::make_unique<Widget>()] // init data mbr in closure w/ result of call to make_unique
{ return pw->isValidated() && pw->isArchived(); };

Another name for init capture is generalized lambda capture.

C++ 11 模拟 init capture

a lambda expression is simply a way to cause a class to be generated and an object of that type to be created. 用 C++11 手动实现 lambda 式:

1
2
3
4
5
6
7
8
9
10
11
12
13
class IsValAndArch { // "is validated and archived"
public:
using DataType = std::unique_ptr<Widget>;
explicit IsValAndArch(DataType&& ptr)
: pw(std::move(ptr)) {}

bool operator()() const
{ return pw->isValidated() && pw->isArchived(); }
private:
DataType pw;
};

auto func = IsValAndArch(std::make_unique<Widget>());

总结成套路如下, move capture can be emulated in C++11 by:

  1. moving the object to be captured into a function object produced by std::bind and
  2. giving the lambda a reference to the “captured” object.

例子如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
//C++14
std::vector<double> data; // object to be moved into closure
// populate data
auto func = [data = std::move(data)] // C++14 init capture
{ /* uses of data */ };

//C++11
std::vector<double> data; // as above
// as above
auto func = std::bind( // C++11 emulation of init capture
[](const std::vector<double>& data) //reference-to-const
{ /* uses of data */ },
std::move(data)
);

A bind object contains copies of all the arguments passed to std::bind. For each lvalue argument, the corresponding object in the bind object is copy constructed. For each rvalue, it’s move constructed.

The move-constructed copy of data inside the bind object is not const, however, so to prevent that copy of data from being modified inside the lambda, the lambda’s parameter is declared reference-to-const.

If the lambda were declared mutable, operator() in its closure class would not be declared const, and it would be appropriate to omit const in the lambda’s parameter declaration:

1
2
3
4
5
6
auto func =
std::bind( // C++11 emulation of init capture for mutable lambda
[](std::vector<double>& data) mutable
{ /* uses of data */ },
std::move(data)
);

C++ 11 中使用 std::bind 实现 init capture 的实践总结如下:

  • It’s not possible to move-construct an object into a C++11 closure, but it is possible to move-construct an object into a C++11 bind object.
  • Emulating move-capture in C++11 consists of move-constructing an object into a bind object, then passing the move-constructed object to the lambda by reference.
  • Because the lifetime of the bind object is the same as that of the closure, it’s possible to treat objects in the bind object as if they were in the closure. 实质上是通过将参数与 lambda 表达式的 lifetime 同步而实现”看起来是” init capture.

Things to Remember

  • Use C++14’s init capture to move objects into closures.
  • In C++11, emulate init capture via hand-written classes or std::bind.

Item 33: Use decltype on auto&& parameters to std::forward them.

generic lambdas in C++14 —lambdas that use auto in their parameter specifications. operator() in the lambda’s closure class is a template, 具体意思如下:

1
2
3
4
5
6
7
8
9
auto f = [](auto x){ return func(normalize(x)); };
// closure class’s function call operator looks like this:
class SomeCompilerGeneratedClassName {
public:
template<typename T>
auto operator()(T x) const
{ return func(normalize(x)); }
// other closure class
}; // functionality

问题点: If normalize treats lvalues differently from rvalues, this lambda isn’t written properly, because it always passes an lvalue (the parameter x) to normalize, even if the argument that was passed to the lambda was an rvalue.

采用万能引用 + 完美转发, 但是 std::forward<T> 中的 T 如何指定?

1
2
auto f = [](auto&& x)
{ return func(normalize(std::forward<???>(x))); };

目的: 实现对传入参数究竟是左值还是右值的区分, 选择不同的重载函数.

链条 1: 对于万能引用: if an lvalue argument is passed to a universal reference parameter, the type of that parameter becomes an lvalue reference. If an rvalue is passed, the parameter becomes an rvalue reference. This means that in our lambda, we can determine whether the argument passed was an lvalue or an rvalue by inspecting the type of the parameter x ==> 也就是通过下面的方法.

链条 2: decltype(x) will produce a type that’s an lvalue reference. If an rvalue was passed, decltype(x) will produce an rvalue reference type.

链条 3: when calling std::forward, convention dictates that the type argument be an lvalue reference to indicate an lvalue and a non-reference to indicate an rvalue. In our lambda, if x is bound to an lvalue, decltype(x) will yield an lvalue reference. That conforms to convention. However, if x is bound to an rvalue, decltype(x) will yield an rvalue reference instead of the customary non-reference. 但是在引用折叠的作用下, 对于 non-reference 与 rvalue reference 的推导结果是一致的.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Widget&& forward(Widget& param) // instantiation of std::forward when T is Widget
{
return static_cast<Widget&&>(param);
}
//(before reference-collapsing)
Widget&& && forward(Widget& param) // instantiation of std::forward when T is Widget&&
{
return static_cast<Widget&& &&>(param);
}
//(after reference-collapsing) 与 T is Widget 结果一致
Widget&& forward(Widget& param) // instantiation of std::forward when T is Widget&&
{
return static_cast<Widget&&>(param);
}

So for both lvalues and rvalues, passing decltype(x) to std::forward gives us the result we want. Our perfect-forwarding lambda can therefore be written like this:

1
2
3
4
5
6
auto f =
[](auto&& param)
{
return
func(normalize(std::forward<decltype(param)>(param)));
};

C++14 lambdas can also be variadic:

1
2
3
4
5
6
auto f =
[](auto&&... params)
{
return
func(normalize(std::forward<decltype(params)>(params)...));
};

Things to Remember

  • Use decltype on auto&& parameters to std::forward them.

Item 34: Prefer lambdas to std::bind.

lambdas are more readable

设置闹钟(1h 后响 30s 的音效为 s 的警告)的例子:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// typedef for a point in time
using Time = std::chrono::steady_clock::time_point;
enum class Sound { Beep, Siren, Whistle };
// typedef for a length of time
using Duration = std::chrono::steady_clock::duration;
// at time t, make sound s for duration d
void setAlarm(Time t, Sound s, Duration d);

//lambda version
// setSoundL ("L" for "lambda") is a function object allowing a
// sound to be specified for a 30-sec alarm to go off an hour
// after it's set
auto setSoundL =
[](Sound s)
{
// make std::chrono components available w/o qualification
using namespace std::chrono;
setAlarm(steady_clock::now() + hours(1), // alarm to go off
s, // in an hour for
seconds(30)); // 30 seconds
};

//C++14 with std::literals
auto setSoundL =
[](Sound s)
{
using namespace std::chrono;
using namespace std::literals; // for C++14 suffixes
setAlarm(steady_clock::now() + 1h, // C++14, but same meaning as above
s,
30s);
};

//与之对比的 bind 版本
//simplified and incorrect std::bind version
using namespace std::chrono; // as above
using namespace std::literals;
using namespace std::placeholders; // needed for use of "_1"
auto setSoundB = // "B" for "bind"
std::bind(setAlarm,
steady_clock::now() + 1h, // incorrect! see below
_1,
30s);

The type of this _1is not identified in the call to std::bind, so readers have to consult the setAlarm declaration to determine what kind of argument to pass to setSoundB.

参数 expression evaluation 的 timing 不同

上面不正确的原因: we want the alarm to go off an hour after invoking setAlarm. In the std::bind call, however, steady_clock::now() + 1h is passed as an argument to std::bind, not to setAlarm. That means that the expression will be evaluated when std::bind is called, and the time resulting from that expression will be stored inside the resulting bind object. As a consequence, the alarm will be set to go off an hour after the call to std::bind, not an hour after the call to setAlarm!

纠正方式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
//C++14
auto setSoundB =
std::bind(setAlarm,
std::bind(std::plus<>(), steady_clock::now(), 1h),
_1,
30s);
//C++11, need to decide type for std::plus
using namespace std::chrono; // as above
using namespace std::placeholders;
auto setSoundB =
std::bind(setAlarm,
std::bind(std::plus<steady_clock::time_point>(),
steady_clock::now(),
hours(1)),
_1,
seconds(30));

支持重载与否

When setAlarm is overloaded, a new issue arises.

1
2
enum class Volume { Normal, Loud, LoudPlusPlus };
void setAlarm(Time t, Sound s, Duration d, Volume v); //overloader

lambda 版本不用修改, 但是 std::bind 版本不能通过编译:

1
2
3
4
5
6
7
auto setSoundB = // error! which setAlarm?
std::bind(setAlarm,
std::bind(std::plus<>(),
steady_clock::now(),
1h),
_1,
30s);

setAlarm must be cast to the proper function pointer type:

1
2
3
4
5
6
7
8
using SetAlarm3ParamType = void(*)(Time t, Sound s, Duration d);
auto setSoundB = // error! which setAlarm?
std::bind(static_cast<SetAlarm3ParamType>(setAlarm),
std::bind(std::plus<>(),
steady_clock::now(),
1h),
_1,
30s);

内联性

lambda 更容易被编译器 inlined, 而 std::bind 不行. Compilers are less likely to inline function calls through function pointers(std::bind).

It’s thus possible that using lambdas generates faster code than using std::bind.

编程复杂度

通过下面的例子, 一目了然.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
//C++14 lambda version
auto betweenL =
[lowVal, highVal]
(const auto& val) // C++14
{ return lowVal <= val && val <= highVal; };

//C++11 lambda version
auto betweenL = // C++11 version
[lowVal, highVal]
(int val)
{ return lowVal <= val && val <= highVal; };

//C++14 std::bind version
using namespace std::placeholders; // as above
auto betweenB =
std::bind(std::logical_and<>(),
std::bind(std::less_equal<>(), lowVal, _1),
std::bind(std::less_equal<>(), _1, highVal));

//C++11 std::bind version
auto betweenB = // C++11 version
std::bind(std::logical_and<bool>(),
std::bind(std::less_equal<int>(), lowVal, _1),
std::bind(std::less_equal<int>(), _1, highVal));

参数的传递方式的指定

1
2
3
4
5
6
enum class CompLevel { Low, Normal, High }; // compression level
Widget compress(const Widget& w, // make compressed copy of w
CompLevel lev);//std::bind
Widget w;
using namespace std::placeholders;
auto compressRateB = std::bind(compress, w, _1);

when we pass w to std::bind, it has to be stored for the later call to compress. It’s stored inside the object compressRateB, but how is it stored—by value or by reference? It makes a difference, because if w is modified between the call to std::bind and a call to compressRateB, storing w by reference will reflect the changes, while storing it by value won’t.

std::bind always copies its arguments, but callers can achieve the effect of having an argument stored by reference by applying std::ref to it. The result of

1
auto compressRateB = std::bind(compress, std::ref(w), _1);

is that compressRateB acts as if it holds a reference to w, rather than a copy.

lambda 表达式则可以显式指定是 by value 还是 by reference.

1
2
3
4
5
6
7
auto compressRateL = // w is captured by value; lev is passed by value
[w](CompLevel lev)
{ return compress(w, lev); };

compressRateL(CompLevel::High); // arg is passed by value

compressRateB(CompLevel::High); // how is arg passed?

C++11 中的 std::bind 的用武之地

In C++14, there are no reasonable use cases for std::bind.

In C++11, however, std::bind can be justified in two constrained situations:

  • Move capture. 见 Item 32.
  • Polymorphic function objects. Because the function call operator on a bind object uses perfect forwarding, it can accept arguments of any type. This can be useful when you want to bind an object with a templatized function call operator.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class PolyWidget {
public:
template<typename T>
void operator()(const T& param);

};
//std::bind can bind a PolyWidget as follows:
PolyWidget pw;
auto boundPW = std::bind(pw, _1);
//boundPW can then be called with different types of arguments:
boundPW(1930); // pass int to PolyWidget::operator()
boundPW(nullptr); // pass nullptr to PolyWidget::operator()
boundPW("Rosebud"); // pass string literal to PolyWidget::operator()

//C++14的 lambda 代替
auto boundPW = [pw](const auto& param)
{ pw(param); };

Things to Remember

  • Lambdas are more readable, more expressive, and may be more efficient than using std::bind.
  • In C++11 only, std::bind may be useful for implementing move capture or for binding objects with templatized function call operators.

CHAPTER 7 The Concurrency API

Item 35: Prefer task-based programming to thread-based.

两种异步方式, run a function doAsyncWork asynchronously

  1. thread-based approach

    1
    2
    int doAsyncWork();
    std::thread t(doAsyncWork);
  2. task-based approach

1
auto fut = std::async(doAsyncWork); // "fut" for "future"

task-based 的好处:

  1. return value: the future returned from std::async offers the get function.
  2. exception: if doAsyncWork emits an exception, because get provides access to that, too.
  3. the higher level of abstraction that task-based embodies. It frees you from the details of thread management.

资源管理上不需要像 std::thread 那样操心

summarize the three meanings of “thread” in concurrent C++ software:

  • Hardware threads are the threads that actually perform computation. Contemporary machine architectures offer one or more hardware threads per CPU core.
  • Software threads (also known as OS threads or system threads) are the threads that the operating system manages across all processes and schedules for execution on hardware threads. It’s typically possible to create more software threads than hardware threads, because when a software thread is blocked (e.g., on I/O or waiting for a mutex or condition variable), throughput can be improved by executing other, unblocked, threads.
  • std::threads are objects in a C++ process that act as handles to underlying software threads. Some std::thread objects represent “null” handles.

1. thread exhaustion and load balancing

Software threads are a limited resource. If you try to create more than the system can provide, a std::system_error exception is thrown. This is true even if the function you want to run can’t throw. For example, even if doAsyncWork is noexcept.

解决办法:

  • One approach is to run doAsyncWork on the current thread, but that could lead to unbalanced loads.
  • Another option is to wait for some existing software threads to complete and then try to create a new std::thread again, but it’s possible that the existing threads are waiting for an action that doAsyncWork is supposed to perform (e.g., produce a result or notify a condition variable) 造成死锁现象.

2. oversubscription

定义: when there are more ready-to-run (i.e., unblocked) software threads than hardware threads. 太多新开线程的请求了.

When that happens, the thread scheduler (typically part of the OS) time-slices the software threads on the hardware. When one thread’s time-slice is finished and another’s begins, a context switch is performed. Such context switches increase the overall thread management overhead of the system,

and they can be particularly costly when the hardware thread on which a software thread is scheduled is on a different core than was the case for the software thread during its last time-slice. In that case,

(1) the CPU caches are typically cold for that software thread (i.e., they contain little data and few instructions useful to it)

(2) the running of the “new” software thread on that core “pollutes” the CPU caches for “old” threads that had been running on that core and are likely to be scheduled to run there again.

Avoiding oversubscription is difficult, 仅靠人工手调不现实.

because the optimal ratio of software to hardware threads depends on how often the software threads are runnable, and that can change dynamically, e.g., when a program goes from an I/O-heavy region to a computation-heavy region. The best ratio of software to hardware threads is also dependent on the cost of context switches and how effectively the software threads use the CPU caches.

3. machine architecture

Furthermore, the number of hardware threads and the details of the CPU caches (e.g., how large they are and their relative speeds) depend on the machine architecture.

切换成 task based 让系统帮我们做这些难以确定(包括 portable)的基础

1
auto fut = std::async(doAsyncWork); 

This call shifts the thread management responsibility to the implementer of the C++ Standard Library.

为啥 std::async 可以解决呢? 因为它站在整个系统 runtime 的角度来排布, 而不是用户的我们只有有限的信息. because std::async, when called in this form (i.e., with the default launch policy), doesn’t guarantee that it will create a new software thread. Rather, it permits the scheduler to arrange for the specified function to be run on the thread requesting doAsyncWork’s result (i.e., on the thread calling get or wait on fut), and reasonable schedulers take advantage of that freedom if the system is oversubscribed or is out of threads.

State-of-the-art thread schedulers employ system-wide thread pools to avoid oversubscription, and they improve load balancing across hardware cores through workstealing algorithms.

如何手动确保某些高优先级(或者说依赖上处于前提的)的 task 能被执行呢?

Pass the std::launch::async launch policy to std::async ==> ensure that the function really executes on a different thread.

使用 thread based 的场景

  • You need access to the API of the underlying threading implementation. 需要底层的接口

    The C++ concurrency API is typically implemented using a lower-level platform-specific API, usually pthreads or Windows’ Threads. Those APIs are currently richer than what C++ offers. (For example, C++ has no notion of thread priorities or affinities). To provide access to the API of the underlying threading implementation, std::thread objects typically offer the native_handle member function. There is no counterpart to this functionality for std::futures (i.e.,for what std::async returns).

  • You need to and are able to optimize thread usage for your application.
    This could be the case, for example, if you’re developing server software with a known execution profile that will be deployed as the only significant process on a machine with fixed hardware characteristics.

  • You need to implement threading technology beyond the C++ concurrency API, e.g., thread pools on platforms where your C++ implementations don’t offer them.

Things to Remember

  • The std::thread API offers no direct way to get return values from asynchronously run functions, and if those functions throw, the program is terminated.
  • Thread-based programming calls for manual management of thread exhaustion, oversubscription, load balancing, and adaptation to new platforms.
  • Task-based programming via std::async with the default launch policy handles most of these issues for you.

Item 36: Specify std::launch::async if asynchronicity is essential.

There are two standard policies for std::async.

  • The std::launch::async launch policy means that f must be run asynchronously, i.e., on a different thread. 异步执行.
  • The std::launch::deferred launch policy means that f may run only when get or wait is called on the future returned by std::async. That is, f’s execution is deferred until such a call is made. When get or wait is invoked, f will execute synchronously, i.e., the caller will block until f finishes running. If neither get nor wait is called, f will never run. 同步执行.

default launch policy

default launch policy is neither of these. 两者之一, 具体执行哪个策略看 system 的策略与当时的场景.

1
2
3
4
auto fut1 = std::async(f); // run f using default launch policy
auto fut2 = std::async(std::launch::async | // run f either async or deferred
std::launch::deferred,
f);

Given a thread t executing this statement,

1
auto fut = std::async(f); // run f using default launch policy
  • It’s not possible to predict whether f will run concurrently with t, because f might be scheduled to run deferred.
  • It’s not possible to predict whether fruns on a thread different from the thread invoking get or wait on fut. If that thread is t, the implication is that it’s not possible to predict whether f runs on a thread different from t.
  • It may not be possible to predict whether f runs at all, because it may not be possible to guarantee that get or wait will be called on fut along every path through the program.

The upshot of these various considerations is that using std::async with the default launch policy for a task is fine as long as the following conditions are fulfilled:

  • The task need not run concurrently with the thread calling get or wait.
  • It doesn’t matter which thread’s thread_local variables are read or written.
  • Either there’s a guarantee that get or wait will be called on the future returned by std::async or it’s acceptable that the task may never execute.
  • Code using wait_for or wait_until takes the possibility of deferred status into account.

第四点的例子如下:

1
2
3
4
5
6
7
8
9
10
11
using namespace std::literals; // for C++14 duration suffixes
void f() // f sleeps for 1 second, then returns
{
std::this_thread::sleep_for(1s);
}
auto fut = std::async(f); // run f asynchronously (conceptually)
while (fut.wait_for(100ms) != // loop until f has finished running...
std::future_status::ready)
{ // which may never happen!

}

问题在于 if f is deferred, fut.wait_for will always return std::future_status::deferred. That will never be equal to std::future_status::ready, so the loop will never terminate. 最关键的是这种 bug 还非常难被发现. because it may manifest itself only under heavy loads. Those are the conditions that push the machine towards oversubscription or thread exhaustion, and that’s when a task may be most likely to be deferred.

解决, wait_for(0s) 判断 task is deferred or not.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
auto fut = std::async(f); // as above
if (fut.wait_for(0s) == // if task is deferred...
std::future_status::deferred)
{
// ...use wait or get on fut
// to call f synchronously
} else { // task isn't deferred
while (fut.wait_for(100ms) != // infinite loop not possible (assuming f finishes)
std::future_status::ready) {
// task is neither deferred nor ready,
// so do concurrent work until it's ready
}
// fut is ready
}

std::launch::async

If any of these conditions fails to hold

1
auto fut = std::async(std::launch::async, f); // launch f asynchronously

std::launch::async 与 task 封装在一起执行 std::async 做法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
template<typename F, typename... Ts>
inline
std::future<typename std::result_of<F(Ts...)>::type>
reallyAsync(F&& f, Ts&&... params) // return future for asynchronous
{
return std::async(std::launch::async, // call to f(params...)
std::forward<F>(f),
std::forward<Ts>(params)...);
}

//使用
auto fut = reallyAsync(f); // run f asynchronously; throw if std::async would throw

//C++14
template<typename F, typename... Ts>
inline
auto
reallyAsync(F&& f, Ts&&... params)
{
return std::async(std::launch::async,
std::forward<F>(f),
std::forward<Ts>(params)...);
}

Things to Remember

  • The default launch policy for std::async permits both asynchronous and synchronous task execution.
  • This flexibility leads to uncertainty when accessing thread_locals, implies that the task may never execute, and affects program logic for timeout-based wait calls.
  • Specify std::launch::async if asynchronous task execution is essential.

Item 37: Make std::threads unjoinable on all paths.

定义: A std::thread corresponding to an underlying thread that’s blocked or waiting to be scheduled is joinable.

Unjoinable std::thread objects include:

  • Default-constructed std::threads. Such std::threads have no function to execute, hence don’t correspond to an underlying thread of execution.
  • std::thread objects that have been moved from. The result of a move is that the underlying thread of execution a std::thread used to correspond to (if any) now corresponds to a different std::thread.
  • std::threads that have been joined. After a join, the std::thread object no longer corresponds to the underlying thread of execution that has finished running.
  • std::threads that have been detached. A detach severs the connection between a std::thread object and the underlying thread of execution it corresponds to.

One reason a std::thread’s joinability is important is that if the destructor for a joinable thread is invoked, execution of the program is terminated.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
constexpr auto tenMillion = 10000000;
// C++14: to use an apostrophe as a digit separator
//constexpr auto tenMillion = 10'000'000;

bool doWork(std::function<bool(int)> filter, // returns whether computation was performed;
int maxVal = tenMillion)
{
std::vector<int> goodVals; // values that satisfy filter

std::thread t([&filter, maxVal, &goodVals] // populate goodVals
{
for (auto i = 0; i <= maxVal; ++i)
{ if (filter(i)) goodVals.push_back(i); }
});

auto nh = t.native_handle(); // use t's native handle to set t's priority

if (conditionsAreSatisfied()) {
t.join(); // let t finish
performComputation(goodVals);
return true; // computation was performed
}
return false; // computation was not performed
}

不使用 task 的原因: we require use of the thread’s native handle(priority setting of thread), and that’s accessible only through the std::thread API; the task-based API (i.e., futures) doesn’t provide it.

代码里的一个小瑕疵: A better design would be to start t in a suspended state (thus making it possible to adjust its priority before it does any computation), 这里为了简洁地传达下面的问题, 因此忽略这一点.

if conditionsAreSatisfied() returns false or throws an exception, the std::thread object t will be joinable when its destructor is called at the end of doWork. That would cause program execution to be terminated. why the std::thread destructor behaves this way. It’s because the two other obvious options are arguably worse. They are:

  • An implicit join. In this case, a std::thread’s destructor would wait for its underlying asynchronous thread of execution to complete. That sounds reasonable, but it could lead to performance anomalies that would be difficult to track down. For example, it would be counterintuitive that doWork would wait for its filter to be applied to all values if conditionsAreSatisfied() had already returned false.
  • An implicit detach. In this case, a std::thread’s destructor would sever the connection between the std::thread object and its underlying thread of execution. The underlying thread would continue to run. the debugging problems it can lead to are worse. In doWork, for example, goodVals is a local variable that is captured by reference. It’s also modified inside the lambda (via the call to push_back). Suppose, then, that while the lambda is running asynchronously, conditionsAreSatisfied() returns false. In that case, doWork would return, and its local variables (including goodVals) would be destroyed. Its stack frame would be popped, and execution of its thread would continue at doWork’s call site. 因此 dowork 后面的函数与 t 上的 push_back 可能会交替使用 stack 导致难以 debug 的错误.

因此标准把保证 unjoinable 的义务给到了 programmer. 有哪些 path 呢?

covering every path can be complicated. It includes flowing off the end of the scope as well as jumping out via a return, continue, break, goto or exception. 为了减轻 path 遍历负担, 考虑使用如下策略

写一个 RAII 的 std::thread 确保 unjoinable

Any time you want to perform some action along every path out of a block, the normal approach is to put that action in the destructor of a local object. Such objects are known as RAII objects. RAII classes are common in the Standard Library. Examples include the STL containers, the standard smart pointers, std::fstream objects, and many more.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class ThreadRAII {
public:
enum class DtorAction { join, detach };

ThreadRAII(std::thread&& t, DtorAction a) // in dtor, take action a on t
: action(a), t(std::move(t)) {}

~ThreadRAII()
{
if (t.joinable()) {
if (action == DtorAction::join) {
t.join();
} else {
t.detach();
}
}
}

std::thread& get() { return t; }
private:
DtorAction action;
std::thread t;
};

note:

  • Recall that std::thread objects aren’t copyable.
  • That order puts the std::thread object last. In this class, the order makes no difference, but in general, it’s possible for the initialization of one data member to depend on another, and because std::thread objects may start running a function immediately after they are initialized, it’s a good habit to declare them last in a class. That guarantees that at the time they are constructed, all the data members that precede them have already been initialized and can therefore be safely accessed by the asynchronously running thread that corresponds to the std::thread data member.
  • ThreadRAII offers a get function to provide access to the underlying std::thread object. 就像智能指针的 get 函数返回 raw pointer 一样.
  • if (t.joinable()) 的意义: because invoking join or detach on an unjoinable thread yields undefined behavior. It’s possible that a client constructed a std::thread, created a ThreadRAII object from it, used get to acquire access to t, and then did a move from t or called join or detach on it. Each of those actions would render t unjoinable.
  • 析构函数里的 if (t.joinable())t.join(); 不会构成 race condition 吗? 不会 because between execution of t.joinable() and invocation of join or detach, another thread could render t unjoinable. At the time a ThreadRAII object’s destructor is invoked, no other thread should be making member function calls on that object. If there are simultaneous calls, there is certainly a race, but it isn’t inside the destructor, it’s in the client code that is trying to invoke two member functions (the destructor and something else) on one object at the same time. In general, simultaneous member function calls on a single object are safe only if all are to const member functions.

应用 RAII 后的使用修改案例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
bool doWork(std::function<bool(int)> filter, // as before
int maxVal = tenMillion)
{
std::vector<int> goodVals; // as before
ThreadRAII t( // use RAII object
std::thread([&filter, maxVal, &goodVals]
{
for (auto i = 0; i <= maxVal; ++i)
{ if (filter(i)) goodVals.push_back(i); }
}),
ThreadRAII::DtorAction::join // RAII action
);

auto nh = t.get().native_handle();

if (conditionsAreSatisfied()) {
t.get().join();
performComputation(goodVals);
return true;
}
return false;
}

添加移动构造函数

because ThreadRAII declares a destructor, there will be no compiler-generated move operations, but there is no reason ThreadRAII objects shouldn’t be movable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class ThreadRAII {
public:
enum class DtorAction { join, detach }; // as before

ThreadRAII(std::thread&& t, DtorAction a) // as before
: action(a), t(std::move(t)) {}

~ThreadRAII()
{
// as before
}

ThreadRAII(ThreadRAII&&) = default; // support moving
ThreadRAII& operator=(ThreadRAII&&) = default; //support moving
std::thread& get() { return t; } // as before

private: // as before
DtorAction action;
std::thread t;
};

Things to Remember

  • Make std::threads unjoinable on all paths.
  • join-on-destruction can lead to difficult-to-debug performance anomalies.
  • detach-on-destruction can lead to difficult-to-debug undefined behavior.
  • Declare std::thread objects last in lists of data members.

Item 38: Be aware of varying thread handle destructor behavior.

both std::thread objects and future objects can be thought of as handles to system threads.

where is the callee’s result stored?

  1. 不能存在 callee 这边(随时可能析构导致 future 消失). The callee could finish before the caller invokes get on a corresponding future, so the result can’t be stored in the callee’s std::promise. That object, being local to the callee, would be destroyed when the callee finished.
  2. 也不能存在 caller 这边, 当出现 std::shared_future 时, 无法判断放在哪个 caller 上. The result can’t be stored in the caller’s future, either, because (among other reasons) a std::future may be used to create a std::shared_future (thus transferring ownership of the callee’s result from the std::future to the std::shared_future), which may then be copied many times after the original std::future is destroyed. Given that not all result types can be copied (i.e., move-only types) and that the result must live at least as long as the last future referring to it, which of the potentially many futures corresponding to the callee should be the one to contain its result?

结果只能放在第三方位置:
This location is known as the shared state. The shared state is typically represented by a heap-based object, but its type, interface, and implementation are not specified by the Standard.

behavior of a future’s destructor

the behavior of a future’s destructor—the topic of this Item—is determined by the shared state associated with the future.

  • The destructor for the last future referring to a shared state for a non-deferred task launched via std::async blocks until the task completes. In essence, the destructor for such a future does an implicit join on the thread on which the asynchronously executing task is running.
  • The destructor for all other futures simply destroys the future object. For asynchronously running tasks, this is akin to an implicit detach on the underlying thread. For deferred tasks for which this is the final future, it means that the deferred task will never run.

上面的规则听起来比较复杂, 可以用下面的方式重新叙述一遍.

Only when all of these conditions are fulfilled does a future’s destructor exhibit special behavior, and that behavior is to block until the asynchronously running task completes. Practically speaking, this amounts to an implicit join with the thread running the std::async-created task.

  • It refers to a shared state that was created due to a call to std::async.
  • The task’s launch policy is std::launch::async, either because that was chosen by the runtime system or because it was specified in the call to std::async.
  • The future is the last future referring to the shared state. For std::futures, this will always be the case.

为什么要考虑 future’s destructor?
例子如下, 当一个类内嵌一个 future 时, 其类的实例的 lifetime 受其 future subobject 的影响.

1
2
3
4
5
6
7
8
std::vector<std::future<void>> futs;

class Widget {
public:
...
private:
std::shared_future<double> fut;
};

如何判断 future 的 destructor 的不同表现类型

没有规定提供接口判断这一特性. 但是 if you have a way of knowing that a given future does not satisfy the conditions that trigger the special destructor behavior (e.g., due to program logic), you’re assured that that future won’t block in its destructor.

For example, only shared states arising from calls to std::async qualify for the special behavior, but there are other ways that shared states get created. One is the use of std::packaged_task.

A std::packaged_task object prepares a function (or other callable object) for asynchronous execution by wrapping it such that its result is put into a shared state. A future referring to that shared state can then be obtained via std::packaged_task’s get_future function:

1
2
3
4
int calcValue(); // func to run
std::packaged_task<int()> pt(calcValue);// wrap calcValue so it can run asynchronously

auto fut = pt.get_future(); // get future for pt

the future fut doesn’t refer to a shared state created by a call to std::async, so its destructor will behave normally.

Once created, the std::packaged_task pt can be run on a thread. (It could be run via a call to std::async, too, but if you want to run a task using std::async, there’s little reason to create a std::packaged_task, because std::async does everything std::packaged_task does before it schedules the task for execution.)

std::packaged_tasks aren’t copyable.

1
std::thread t(std::move(pt)); // run pt on t

放在一起

1
2
3
4
5
6
7
8
9
10
{ // begin block
std::packaged_task<int()> pt(calcValue);

auto fut = pt.get_future();

std::thread t(std::move(pt));

// see below

} // end block

when you have a future corresponding to a shared state that arose due to a std::packaged_task, there’s usually no need to adopt a special destruction policy, because the decision among termination, joining, or detaching will be made in the code that manipulates the std::thread (... part)on which the std::packaged_task is typically run.

Things to Remember

  • Future destructors normally just destroy the future’s data members.
  • The final future referring to a shared state for a non-deferred task launched via std::async blocks until the task completes.

Item 39: Consider void futures for one-shot event communication.

inter-thread communication 有下面 4 种方式:

1. condition variable (condvar)

the reacting task waits on a condition variable, and the detecting thread notifies that condvar when the event occurs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
std::condition_variable cv; // condvar for event
std::mutex m; // mutex for use with cv detecting task
// detect event
cv.notify_one(); // tell reacting task multiple reacting tasks
cv.notify_all();

//reacting task
// prepare to react
{ // open critical section
std::unique_lock<std::mutex> lk(m); // lock mutex
cv.wait(lk); // wait for notify; this isn't correct!
// react to event (m is locked)
} // close crit. section; unlock m via lk's dtor
// continue reacting (m now unlocked)

问题 1. no need with mutex

Mutexes are used to control access to shared data, but it’s entirely possible that the detecting and reacting tasks have no need for such mediation.

For example, the detecting task might be responsible for initializing a global data structure, then turning it over to the reacting task for use. If the detecting task never accesses the data structure after initializing it, and if the reacting task never accesses it before the detecting task indicates that it’s ready, the two tasks will stay out of each other’s way through program logic. There will be no need for a mutex.

问题 2. If the detecting task notifies the condvar before the reacting task waits, the reacting task will hang.

在 reacting task 还没有开始 wait 的时候就通知它, 会导致 reacting task 假装没收到通知, 继续等待错过了通知永远运行.

问题 3. The wait statement fails to account for spurious wakeups.

假醒广泛存在于条件变量的实现中, 具体可以参考 wiki spurious wakeups.

为了处理假醒: Proper code deals with them by confirming that the condition being waited for has truly occurred, and it does this as its first action after waking. The C++ condvar API makes this exceptionally easy, because it permits a lambda (or other function object) that tests for the waited-for condition to be passed to wait.

1
2
cv.wait(lk,
[]{ return whether the event has occurred; });

The reacting thread may have no way of determining whether the event it’s waiting for has taken place.

2. shared boolean flag(atomic)

1
2
3
4
5
6
7
8
std::atomic<bool> flag(false); // shared flag
// detect event
flag = true; // tell reacting task

//reacting thread
// prepare to react
while (!flag); // wait for event
// react to event

问题: cost, 不是真正的 block, 而是 pull.

During the time the task is waiting for the flag to be set, the task is essentially blocked, yet it’s still running.

That’s an advantage of the condvar-based approach, because a task in a wait call is truly blocked.

3. combine the condvar and flag-based designs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
std::condition_variable cv; // as before
std::mutex m;
bool flag(false); // not std::atomic
// detect event
{
std::lock_guard<std::mutex> g(m); // lock m via g's ctor
flag = true; // tell reacting task (part 1)
} // unlock m via g's dtor

cv.notify_one(); // tell reacting task // (part 2)

//reacting task
// prepare to react
{ // as before
std::unique_lock<std::mutex> lk(m); // as before
cv.wait(lk, [] { return flag; }); // use lambda to avoid spurious wakeups
// react to event(m is locked)
}
// continue reacting(m now unlocked)

问题: it doesn’t seem terribly clean, 详细如下(主要是即便通知到了, 也需要 recheck 一下的问题):

Notifying the condition variable tells the reacting task that the event it’s been waiting for has probably occurred, but the reacting task must check the flag to be sure. Setting the flag tells the reacting task that the event has definitely occurred, but the detecting task still has to notify the condition variable so that the reacting task will awaken and check the flag.

4. 使用 std::future<void>

1
2
3
4
5
6
7
8
9
std::promise<void> p; // promise for communications channel
//detecting task
// detect event
p.set_value(); // tell reacting task
//reacting task
// prepare to react
p.get_future().wait(); // wait on future
// corresponding to p
// react to event

std::promisestd::future 就是一对线程间通信的通道.

关于代码中 void 的解释.

there’s no data to be conveyed. The only thing of interest to the reacting task is that its future has been set. What we need for the std::promise and future templates is a type that indicates that no data is to be conveyed across the communications channel. That type is void. The detecting task will thus use a std::promise<void>, and the reacting task a std::future<void> or std::shared_future<void>.

此种方法的 drawbacks

  1. between a std::promise and a future is a shared state, and shared states are typically dynamically allocated. You should therefore assume that this design incurs the cost of heap-based allocation and deallocation.
  2. a std::promise may be set only once. The communications channel between a std::promise and a future is a one-shot mechanism: it can’t be used repeatedly. This is a notable difference from the condvar- and flag-based designs, both of which can be used to communicate multiple times. (A condvar can be repeatedly notified, and a flag can always be cleared and set again.)

one-shot 应用举例

create a system thread in a suspended state. 为什么要 suspend(而不是立即运行)?

  1. you’d like to get all the overhead associated with thread creation out of the way so that when you’re ready to execute something on the thread, the normal thread-creation latency will be avoided.
  2. Or you might want to create a suspended thread so that you could configure it before letting it run( thread characteristics such as priority and affinity).

代码示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
std::promise<void> p;
void react(); // func for reacting task
void detect() // func for detecting task
{
std::thread t([] // create thread
{
p.get_future().wait(); // suspend t until future is set
react();
});

// here, t is suspended prior to call to react
p.set_value(); // unsuspend t (and thus call react)
// do additional work
t.join(); // make t unjoinable
}

有一个坑需要注意, 为了防止 Item 37 介绍的所有路径下都使 t unjoinable, 使用 local object ThreadRAII 来自动化处理.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
void detect()
{
ThreadRAII tr( // use RAII object
std::thread([]
{
p.get_future().wait();
react();
}),
ThreadRAII::DtorAction::join // risky! (see below)
);
... // thread inside tr is suspended here
p.set_value();
...// unsuspend thread inside tr
}

问题在于 p.set_value(); 上面的 ... 部分一旦抛出异常导致 p.set_value(); 无法被执行, tr 会 hang. Because tr’s destructor will never complete.

作者在其博客上抛出了此问题, 吸引很多人对解决此问题的提出了自己的思路.

我个人觉得问题的根源是 std::promise<void> pThreadRAII tr 的 lifetime 管理方式不一致导致的. 解决思路也是构建一个 RAII 的 std::promise<void> wrapping. 在析构时如果发现没有手动 set_value 的情况下, 在析构函数中 set_value. 当然默认是手动 set_value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class PromiseRAII {
public:

PromiseRAII(std::promise<void>&& p)
: pro(std::move(p)) {}

~PromiseRAII()
{
if(!manually_set)
{
pro.set_value();
}
}
void set_value()
{
manually_set = true;
pro.set_value();
}

PromiseRAII(PromiseRAII&&) = default;
PromiseRAII& operator=(PromiseRAII&&) = default;
std::promise<void> & get() { return pro; }

private:
bool manually_set{false};
std::promise<void> pro;
};

拓展到 suspend 多个 reacting thread, 应用 std::shared_future<void>.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
std::promise<void> p; // as before
void detect() // now for multiple reacting tasks
{
auto sf = p.get_future().share(); // sf's type is std::shared_future<void>
std::vector<std::thread> vt; // container for reacting threads
for (int i = 0; i < threadsToRun; ++i) {
vt.emplace_back([sf]{ sf.wait(); // wait on local
react(); }); // copy of sf; see
}

// detect hangs if this "…" code throws!

p.set_value(); // unsuspend all threads



for (auto& t : vt) { // make all threads unjoinable;
t.join();
}
}

Things to Remember

  • For simple event communication, condvar-based designs require a superfluous mutex, impose constraints on the relative progress of detecting and reacting tasks, and require reacting tasks to verify that the event has taken place.
  • Designs employing a flag avoid those problems, but are based on polling, not blocking.
  • A condvar and flag can be used together, but the resulting communications mechanism is somewhat stilted.
  • Using std::promises and futures dodges these issues, but the approach uses heap memory for shared states, and it’s limited to one-shot communication.

Item 40: Use std::atomic for concurrency, volatile for special memory.

std::atomic template 介绍

Instantiations of this template offer operations that are guaranteed to be seen as atomic by other threads. Once a std::atomic object has been constructed, operations on it behave as if they were inside a mutex-protected critical section, but the operations are generally implemented using special machine instructions that are more efficient than would be the case if a mutex were employed. 例子如下:

1
2
3
4
5
std::atomic<int> ai(0); // initialize ai to 0
ai = 10; // atomically set ai to 10
std::cout << ai; // atomically read ai's value
++ai; // atomically increment ai to 11
--ai; // atomically decrement ai to 10

Two aspects of this example are worth noting.

  1. in the std::cout << ai; statement, the fact that ai is a std::atomic guarantees only that the read of ai is atomic. There is no guarantee that the entire statement proceeds atomically. Between the time ai’s value is read and operator<< is invoked to write it to the standard output, another thread may have modified ai’s value. That has no effect on the behavior of the statement, because operator<< for ints uses a by-value parameter for the int to output (the outputted value will therefore be the one that was read from ai).
  2. the increment and decrement of ai. These are each read-modify-write(RMW) operations, yet they execute atomically. This is one of the nicest characteristics of the std::atomic types: once a std::atomic object has been constructed, all member functions on it, including those comprising RMW operations, are guaranteed to be seen by other threads as atomic.

volatile 不胜任多线程任务的原因

no guarantee of operation atomicity - volatile

In contrast, the corresponding code using volatile guarantees virtually nothing in a multithreaded context.

文中给出了 undefined behavior 的理解, 比较经典, 贴出来:
Undefined behavior means that compilers may generate code to do literally anything. Compilers don’t use this leeway to be malicious, of course. Rather, they perform optimizations that would be valid in programs without data races, and these optimizations yield unexpected and unpredictable behavior in programs where races are present.

insufficient restrictions on code reordering- volatile

1
2
3
std::atomic<bool> valAvailable(false);
auto imptValue = computeImportantValue(); // compute value
valAvailable = true; // tell other task it's available

As humans reading this code, we know it’s crucial that the assignment to imptValue take place before the assignment to valAvailable, but all compilers see is a pair of assignments to independent variables.

1
2
3
4
5
a = b;
x = y;
//compilers may generally reorder them as follows:
x = y;
a = b;

Even if compilers don’t reorder them, the underlying hardware might do it.

std::atomics imposes restrictions on how code can be reordered(with default sequential consistency policy).

volatile 适用的场景: normal memory 与 special memory

禁止某些编译器优化

1
2
3
4
5
6
7
//redundant loads
auto y = x; // read x
y = x; // read x again

//dead stores
x = 10; // write x
x = 20; // write x again

compilers can treat it as if it had been written like this:

1
2
auto y = x; // read x
x = 20; // write x

为什么会出现上面的现象(一般 programmer 不会写出类似的代码): after compilers take reasonable-looking source code and perform template instantiation, inlining, and various common kinds of reordering optimizations, it’s not uncommon for the result to have redundant loads and dead stores that compilers can get rid of.

Probably the most common kind of special memory is memory used for memory-mapped I/O. Locations in such memory actually communicate with peripherals, e.g., external sensors or displays, printers, network ports, etc. rather than reading or writing normal memory (i.e., RAM). In such a context, consider again the code with seemingly redundant reads:

1
2
auto y = x; // read x
y = x; // read x again

If x corresponds to the value reported by a temperature sensor, the second read of x is not redundant, because the temperature may have changed between the first and second reads.

1
2
x = 10; // write x
x = 20; // write x again

if x corresponds to the control port for a radio transmitter, it could be that the code is issuing commands to the radio, and the value 10 corresponds to a different command from the value 20. Optimizing out the first assignment would change the sequence of commands sent to the radio.

volatile is the way we tell compilers that we’re dealing with special memory. “Don’t perform any optimizations on operations on this memory.”

y 的类型是 int: for the declaration of non-reference non-pointer types (which is the case for y), const and volatile qualifiers are dropped. compilers must perform both the initialization of and the assignment to y, because x is volatile, so the second read of x might yield a different value from the first one.

std::atomic 不适合 special memory

我们的意图是

1
2
3
4
5
std::atomic<int> x;
auto y = x; // conceptually read x (see below)
y = x; // conceptually read x again (see below)
x = 10; // write x
x = 20; // write x again

然而根本编译不过

1
2
auto y = x; // error!
y = x; // error!

为啥? That’s because the copy operations for std::atomic are deleted.

为什么删除?

Consider what would happen if the initialization of y with x compiled. Because x is std::atomic, y’s type would be deduced to be std::atomic, too. In order for the copy construction of y from x to be atomic, compilers would have to generate code to read x and write y in a single atomic operation. Hardware generally can’t do that, so copy construction(and copy assignment) isn’t supported for std::atomic types(The move operations aren’t explicitly declared in std::atomic, so,std::atomic offers neither move construction nor move assignment).

取而代之的是 std::atomic’s member functions load and store.

1
2
std::atomic<int> y(x.load()); // read x
y.store(x.load()); // read x again

但是上面的每一个语句都不再是 atomic 的了. This compiles, but the fact that reading x (via x.load()) is a separate function call from initializing or storing to y makes clear that there is no reason to expect either statement as a whole to execute as a single atomic operation.

Given that code, compilers could “optimize” it by storing x’s value in a register instead of reading it twice:

1
2
3
register = x.load(); // read x into register
std::atomic<int> y(register); // init y with register value
y.store(register); // store register value into y

Because std::atomic and volatile serve different purposes, they can even be used together:

1
2
volatile std::atomic<int> vai; 
// operations on vai are atomic and can't be optimized away

This could be useful if vai corresponded to a memory-mapped I/O location that was concurrently accessed by multiple threads.

Things to Remember

  • std::atomic is for data accessed from multiple threads without using mutexes. It’s a tool for writing concurrent software.
  • volatile is for memory where reads and writes should not be optimized away. It’s a tool for working with special memory.

CHAPTER 8 Tweaks

Item 41: Consider pass by value for copyable parameters that are cheap to move and always copied.

by-reference approaches drawbacks

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Approach 1: use overloading for lvalues and rvalues
class Widget {
public:
void addName(const std::string& newName)
{ names.push_back(newName); }

void addName(std::string&& newName)
{ names.push_back(std::move(newName)); }

private:
std::vector<std::string> names;
};

// Approach 2: use universal reference
class Widget {
public:
template<typename T>
void addName(T&& newName)
{ names.push_back(std::forward<T>(newName)); }

};

// Approach 3: pass by value
class Widget {
public:
void addName(std::string newName)
{ names.push_back(std::move(newName)); }

};

前两种为 by-reference approaches, 后者为 pass by value approach

Approach 1(overloading for lvalues and rvalues) 的 drawbacks

  • 维护性不高. writing two functions that do essentially the same thing.
  • 无法内联的情况下导致代码膨胀. Two functions in the object code, In this case, both functions will probably be inlined but if these functions aren’t inlined everywhere, you really will get two functions in your object code.

Approach 2(universal reference) 的 drawbacks

  • 模板实例化带来的代码膨胀. Bloated header files: As a template, addName’s implementation must typically be in a header file. It may yield several functions in object code, because it not only instantiates differently for lvalues and rvalues, it also instantiates differently for std::string and types that are convertible to std::string.

  • universal reference 痼疾:

    • 有些参数类型无法作为万能引用传入, odd failure cases: there are argument types that can’t be passed by universal reference.
    • 调试信息晦涩难懂, confounding error messages: if clients pass improper argument types, compiler error messages can be intimidating.

pass by value 的选用的合理之处

避免 2 种 by references 的缺点, 代价也并不是那么地大(限定前提条件下)

代价是效率方面(largely ignore the possibility of compilers optimizing copy and move operations away, because such optimizations are context- and compiler-dependent)

Lvalues Rvalues
Approach 1 one copy one move
Approach 2 one copy(might be uniquely efficient) one move(might be uniquely efficient)
Approach 3 one copy plus one move two moves

Approach 2 存在效率最优的可能性: if a caller passes an argument of a type other than std::string, it will be forwarded to a std::string constructor, and that could cause as few as zero std::string copy or move operations to be performed. Functions taking universal references can thus be uniquely efficient.

对本节题目的理解需要细化:

Consider pass by value for copyable parameters that are cheap to move and always copied.

一点点扣字眼, 才能体会到到底什么时候需要使用 pass-by-value

  • You should only consider using pass by value. 考虑综合性价比

  • Consider pass by value only for copyable parameters. move-only types 只能用 move, the “overloading” solution requires only one overload: the one taking an rvalue reference.

  • Pass by value is worth considering only for parameters that are cheap to move. When moves are cheap, the cost of an extra one may be acceptable.

  • You should consider pass by value only for parameters that are always copied.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    class Widget {
    public:
    void addName(std::string newName)
    {
    if ((newName.length() >= minLen) &&
    (newName.length() <= maxLen))
    {
    names.push_back(std::move(newName));
    }
    }

    private:
    std::vector<std::string> names;
    };

    This function incurs the cost of constructing and destroying newName, even if nothing is added to names. That’s a price the by-reference approaches wouldn’t be asked to pay. 即 unconditional copy,对于其又有如下分析: a function can copy a parameter in two ways:

via construction VS via assignment

  • via construction (i.e., copy construction or move construction).
    与上面的结论一致: using pass by value incurs the cost of an extra move for both lvalue and rvalue arguments.

  • via assignment (i.e., copy assignment or move assignment). 情况要特殊一些, 不同的点在于 heap 内存的分配.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
      class Password {
    public:
    explicit Password(std::string pwd) // pass by value construct text
    : text(std::move(pwd)) {}

    void changeTo(std::string newPwd) // pass by value assign text
    { text = std::move(newPwd); }

    private:
    std::string text; // text of password
    };

    // set passwd
    std::string initPwd("Supercalifragilisticexpialidocious");
    Password p(initPwd);

    到这里对于效率的讨论不变, 当重新设置密码时, 需要用到 assignment 的时候, 情况就不一样了. 关系到像 `std::string` 这种 dynamically allocated memory 的数据结构(`including std::string` and `std::vector`), 可能会新增 allocation-deallocation pair 的成本.

    ```c++
    std::string newPassword = "Beware the Jabberwock";
    p.changeTo(newPassword);

The argument passed to changeTo is an lvalue (newPassword), so when the parameter newPwd is constructed, it’s the std::string copy constructor that’s called. That constructor allocates memory to hold the new password. newPwd is then moveassigned to text, which causes the memory already held by text to be deallocated. There are thus two dynamic memory management actions within changeTo: one to allocate memory for the new password, and one to deallocate the memory for the old password.

但是换成 overloading approach 就不会有这种情况出现:

1
2
3
4
5
6
7
8
9
10
11
class Password {
public:

void changeTo(const std::string& newPwd) // the overload for lvalues
{
text = newPwd; // can reuse text's memory if text.capacity() >= newPwd.size()
}

private:
std::string text; // as above
};

直接利用 old text 的内存的可能性(old password 比 new one 较长时). 要知道内存分配与销毁的代价与 copy move operations 相比 exceed that of a std::string move operation by orders of magnitude. If the old password were shorter than the new one, it would typically be impossible to avoid an allocation-deallocation pair during the assignment, and in that case, pass by value would run at about the same speed as pass by reference.

同时对 std::string 不要忘了 SSO 机制.

使用 pass-by-value 使用有罪推断: the most practical approach is to adopt a guilty until proven innocent policy, whereby you use overloading or universal references instead of pass by value unless it’s been demonstrated that pass by value yields acceptably efficient code for the parameter type you need.

其他 pass-by-value 需要考虑的场景

  • 连续 chain 调用产生的累加效果. Moreover, it’s not always clear how many moves will take place. there are chains of function calls, each of which employs pass by value because “it costs only one inexpensive move,” the cost for the entire chain of calls may not be something you can tolerate. Using by-reference parameter passing, chains of calls don’t incur this kind of accumulated overhead.

  • pass by value, unlike pass by reference, is susceptible to the slicing problem. if you have a function that is designed to accept a parameter of a base class type or any type derived from it, you don’t want to declare a pass-by-value parameter of that type.

结论

The upshot is that the extra cost of pass by value for functions that copy a parameter using assignment depends on the type being passed, the ratio of lvalue to rvalue arguments, whether the type uses dynamically allocated memory, and, if so, the implementation of that type’s assignment operators and the likelihood that the memory associated with the assignment target is at least as large as the memory associated with the assignment source.

C++11 doesn’t fundamentally change the C++98 wisdom regarding pass by value. In general, pass by value still entails a performance hit you’d prefer to avoid, and pass by value can still lead to the slicing problem. What’s new in C++11 is the distinction between lvalue and rvalue arguments. Implementing functions that take advantage of move semantics for rvalues of copyable types requires either overloading or using universal references, both of which have drawbacks. For the special case of copyable, cheap-to-move types passed to functions that always copy them and where slicing is not a concern, pass by value can offer an easy-to-implement alternative that’s nearly as efficient as its pass-by-reference competitors, but avoids their disadvantages.

Things to Remember

  • For copyable, cheap-to-move parameters that are always copied, pass by value may be nearly as efficient as pass by reference, it’s easier to implement, and it can generate less object code.
  • Copying parameters via construction may be significantly more expensive than copying them via assignment.
  • Pass by value is subject to the slicing problem, so it’s typically inappropriate for base class parameter types.

Item 42: Consider emplacement instead of insertion.

emplacement outperforms insertion functions

Insertion functions take objects to be inserted, while emplacement functions take constructor arguments for objects to be inserted. This difference permits emplacement functions to avoid the creation and destruction of temporary objects that insertion functions can necessitate.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
//C++11 push_back
template <class T,
class Allocator = allocator<T>>
class vector {
public:

void push_back(const T& x); // insert lvalue
void push_back(T&& x); // insert rvalue

};

std::vector<std::string> vs; // container of std::string
vs.push_back("xyzzy"); // add string literal
//编译器眼里
vs.push_back(std::string("xyzzy")); // create temp. std::string and pass it to push_back

vs.emplace_back("xyzzy"); // construct std::string inside vs directly from "xyzzy"

Here’s what happens at runtime in the call to push_back:

  1. A temporary std::string object is created from the string literal xyzzy. This object has no name; we’ll call it temp. Construction of temp is the first std::string construction. Because it’s a temporary object, temp is an rvalue.
  2. temp is passed to the rvalue overload for push_back, where it’s bound to the rvalue reference parameter x. A copy of x is then constructed in the memory for the std::vector. This construction—the second one—is what actually creates a new object inside the std::vector. (The constructor that’s used to copy x into the std::vector is the move constructor, because x, being an rvalue reference, gets cast to an rvalue before it’s copied.)
  3. Immediately after push_back returns, temp is destroyed, thus calling the std::string destructor.

emplace_back is available for every standard container that supports push_back.
push_front $\leftrightarrow$ emplace_front.

insert $\leftrightarrow$ emplace (std::forward_list and std::array)

hint $\leftrightarrow$ emplace_hint (associative containers)

insert_after $\leftrightarrow$ emplace_after(std::forward_list)

emplacement 能覆盖 insertation 吗? 答案是 in theory YES.

实际上会有例外(非常难以把握): Such situations are not easy to characterize, because they depend on - the types of arguments being passed

  • the containers being used
  • the locations in the containers where insertion or emplacement is requested
  • the exception safety of the contained types’ constructors,
  • for containers where duplicate values are prohibited (i.e., std::set,std::map, std::unordered_set, std::unordered_map), whether the value to be added is already in the container.

The usual performance-tuning advice thus applies: to determine whether emplacement or insertion runs faster, benchmark them both. 因此比较实际的做法是测试后比对.

there’s a heuristic that can help you identify situations where emplacement functions are most likely to be worthwhile. If all the following are true, emplacement will almost certainly outperform insertion:

  • The value being added is constructed into the container, not assigned.

    1
    2
    3
    std::vector<std::string> vs; // as before add elements to vs

    vs.emplace(vs.begin(), "xyzzy"); // add "xyzzy" to beginning of vs

    The new value therefore had to be constructed into the std::vector: move-assign the value into placevs[0]. emplacement’s edge tends to disappear.

    Node-based containers virtually always use construction to add new values, and most standard containers are node-based. The only ones that aren’t are std::vector, std::deque, and std::string. (std::array isn’t, either, but it doesn’t support insertion or emplacement, so it’s not relevant here.) Within the non-node-based containers, you can rely on emplace_back to use construction instead of assignment to get a new value into place, and for std::deque, the same is true of emplace_front. 在前面 emplace 则有可能没有效率上的优势.

  • The argument type(s) being passed differ from the type held by the container.

    emplacement’s advantage over insertion generally stems from the fact that its interface doesn’t require creation and destruction of a temporary object when the argument(s) passed are of a type other than that held by the container.

  • The container is unlikely to reject the new value as a duplicate.

    检查非重复插入项, 需要构建新的 node. This means that the container either permits duplicates or that most of the values you add will be unique. The reason this matters is that in order to detect whether a value is already in the container, emplacement implementations typically create a node with the new value so that they can compare the value of this node with existing container nodes. If the value to be added isn’t in the container, the node is linked in. However, if the value is already present, the emplacement is aborted and the node is destroyed, meaning that the cost of its construction and destruction was
    wasted. Such nodes are created for emplacement functions more often than for insertion functions.

因此下面的例子符合上面的要求, 应该使用 emplacement.

1
2
3
4
5
vs.emplace_back("xyzzy"); 
// construct new value at end of container;
// don't pass the type in container;
// don't use container rejecting duplicates
vs.emplace_back(50, 'x'); // ditto

例外 1: with resource management objects

1
2
3
4
5
6
std::list<std::shared_ptr<Widget>> ptrs;
void killWidget(Widget* pWidget); //custom deleter
ptrs.push_back(std::shared_ptr<Widget>(new Widget, killWidget));
ptrs.push_back({ new Widget, killWidget }); //same sa above

ptrs.emplace_back(new Widget, killWidget);

此种情况下 insertation 要比 emplacement 更加地 exception 安全.

因为 insertation 有一个复制的临时 object, 不会像 emplacement 那样在 exception 时发生 leak.

insertation 的过程:

  1. In either call above, a temporary std::shared_ptr<Widget> object is constructed to hold the raw pointer resulting from new Widget. Call this object temp.
  2. push_back takes temp by reference. During allocation of a list node to hold a copy of temp, an out-of-memory exception gets thrown.
  3. As the exception propagates out of push_back, temp is destroyed. Being the sole std::shared_ptr referring to the Widget it’s managing, it automatically releases that Widget, in this case by calling killWidget.

相对比 emplacement 的过程:

  1. The raw pointer resulting from new Widget is perfect-forwarded to the point inside emplace_back where a list node is to be allocated. That allocation fails, and an out-of-memory exception is thrown.
  2. As the exception propagates out of emplace_back, the raw pointer that was the only way to get at the Widget on the heap is lost. That Widget (and any resources it owns) is leaked.

深层次的原因: 对于 insertion, the functions’ parameter types generally ensure that nothing gets between acquisition of a resource (e.g., use of new) and construction of the object managing the resource. 然而 In the emplacement functions, perfect-forwarding defers the creation of the resource-managing objects until they can be constructed in the container’s memory, and that opens a window during which exceptions can lead to resource leaks. All standard containers are susceptible to this problem. When working with containers of resource-managing objects, you must take care to ensure that if you choose an emplacement function over its insertion counterpart, you’re not paying for improved code efficiency with diminished exception safety.

上面只是为了说明原理而故意构造的场景, 实际非常不推荐这么构造(Item 21), 而是从根本上关闭问题之源, 如下:

1
2
3
4
5
6
7
8
//push_back version
std::shared_ptr<Widget> spw(new Widget, // create Widget and have spw manage it
killWidget);
ptrs.push_back(std::move(spw)); // add spw as rvalue

//emplace_back version
std::shared_ptr<Widget> spw(new Widget, killWidget);
ptrs.emplace_back(std::move(spw));

简而言之:

ensure that nothing can intervene between acquiring a resource and turning it over to a resource-managing object.

例外 2: direct initialization and explicit constructors

1
2
3
4
5
std::vector<std::regex> regexes;
// error but compile
regexes.emplace_back(nullptr); // add nullptr to container of regexes?
std::regex r = nullptr; // error! won't compile
regexes.push_back(nullptr); // error! won't compile

In the call to emplace_back, however, we’re not claiming to pass a std::regex object. Instead, we’re passing a constructor argument for a std::regex object. That’s not considered an implicit conversion request. Rather, it’s viewed as if you’d written this code:

1
std::regex r(nullptr); // compiles but has undefined behavior

Because the std::regex constructor taking a const char* pointer requires that the pointed-to string comprise a valid regular expression, and the null pointer fails that requirement.

1
2
std::regex r1 = nullptr; // error! won't compile
std::regex r2(nullptr); // compiles

The syntax used to initialize r1 (employing the equals sign) corresponds to copy initialization. In contrast, r2 (with the parentheses, although braces may be used instead) yields direct initialization.

Copy initialization is not permitted to use explicit constructors. Direct initialization is.

The lesson to take away is that when you use an emplacement function, be especially careful to make sure you’re passing the correct arguments, because even explicit constructors will be considered by compilers as they try to find a way to interpret your code as valid.

Things to Remember

  • In principle, emplacement functions should sometimes be more efficient than their insertion counterparts, and they should never be less efficient.
  • In practice, they’re most likely to be faster when
    • (1) the value being added is constructed into the container, not assigned;
    • (2) the argument type(s) passed differ from the type held by the container;
    • (3) the container won’t reject the value being added due to it being a duplicate.
  • Emplacement functions may perform type conversions that would be rejected by insertion functions.
作者

cx

发布于

2022-07-30

更新于

2022-11-23

许可协议