Feature image

Introduction Link to this heading

tot manus noctis opus

In today’s fast-paced world, harnessing the power of concurrency is essential for maximizing the performance of software applications. C++, with its robust suite of features for managing multiple tasks simultaneously, stands as a formidable tool in a developer’s arsenal. Here I attempt to demystify the complexities of concurrent programming in C++, offering a guide to understanding and implementing threads, locks, and asynchronous operations effectively. Whether you’re looking to improve the responsiveness of your applications or to fully leverage multi-core processors, mastering concurrency in C++ opens up a world of possibilities. Let’s dive into the nuances of C++ concurrency, starting with the basics and moving towards more advanced concepts.

Threads Link to this heading

A running program will always have at least one thread. When the main function is called at the start of your program, it is executed on the main thread. We are not limited to one thread and can quickly spin up multiple threads like so:

C++
1void print() {
2  std::this_thread::sleep_for(std::chrono::seconds{1});
3  std::cout << "Thread ID: " << std::this_thread::get_id() << "\n";
4}
5int main() {
6  auto ti = std::thread{print};
7  ti.join();
8  std::cout << "Thread ID: " << std::this_thread::get_id() << "\n";
9}

Here we spin up a thread distinct from the main thread to handle the print() function. The join() function halts our main function until the thread has finished. We could instead call ti.detach() if we wish for the main function to continue regardless of the state of the ti thread, which in this case will likely result in the program terminating before the completion of the print function. If you are among the lucky few who can use C++20, we now have std::jthread, which is the same as a normal thread but with the added benefit of RAII. When the jthread reaches the end of scope, it will send a stop request and join the threads on destruction.

C++
1void print() {
2  std::this_thread::sleep_for(std::chrono::seconds{1});
3  std::cout << "Thread ID: " << std::this_thread::get_id() << "\n";
4}
5int main() {
6  auto joinable_thread = std::jthread{print};
7  std::cout << "Thread ID: " << std::this_thread::get_id() << "\n";
8} // This is ok, the jthread will join automatically

You can also more easily stop a thread designed to run until asked to stop as follows:

C++
 1void print(std::stop_token stoken) {
 2  while (!stoken.stop_requested()) {
 3    std::cout << "Thread ID: " << std::this_thread::get_id() << "\n";
 4    std::this_thread::sleep_for(std::chrono::seconds{1});
 5  }
 6}
 7int main() {
 8  auto joinable_thread = std::jthread{print};
 9  std::cout << "Thread ID: " << std::this_thread::get_id() << "\n";
10  std::this_thread::sleep_for(std::chrono::seconds{5});
11  joinable_thread.request_stop();
12} 

Calling request_stop ends the print() function, at which point our threads join, and everyone is happy as our program exits.

What we have shown so far is a simple passing of a function with zero arguments to a thread to be executed. But that is quite limiting. We need to be able to pass data into our thread. There are a lot of different ways to accomplish this; let’s start with some simple approaches. Any class that supports the () operator can be passed into a thread, and the thread will automatically call this operator when being started. Any state within that class is now accessible to the thread:

C++
 1class Vehicle {
 2  int _doors = 4;
 3  Vehicle(int doors) { _doors = doors; }
 4  void operator()() {
 5    std::cout << "This car has: " << _doors << " doors";
 6  }
 7};
 8int main() {
 9  std::thread t1{ Vehicle(4) }; // Use uniform initialization when passing an object to a thread
10  t1.join();
11}

Our thread will print out the correct number. We can also pass data into our thread by using a lambda that uses its capture semantics to bring data with it into the thread:

C++
 1int main() {
 2  int test = 12;
 3  auto l1 = [test]() {
 4    std::cout << test << std::endl;
 5  };
 6  std::thread t1(l1); // captures test by value
 7  t1.join();
 8
 9  int test2 = 13;
10  auto l2 = [&test2]() {
11    test2++;
12    std::cout << test2 << std::endl;
13  };
14  std::thread t2(l2); // Captures test2 by reference
15  t2.join();
16}

We now have the opportunity to take in data by value or reference. This, though, is still very limiting. Luckily, std::thread has a constructor that supports variadic templates, which opens up a lot more opportunities for us.

C++
 1void printID(int id) {
 2    std::this_thread::sleep_for(std::chrono::milliseconds(50));
 3    std::cout << "ID = " << id << std::endl;
 4    
 5}
 6void printIDAndName(int id, std::string name) {
 7    std::this_thread::sleep_for(std::chrono::milliseconds(100));
 8    std::cout << "ID = " << id << ", name = " << name << std::endl;
 9}
10int main() {
11    int id = 0; // Define an integer variable
12
13    // starting threads using variadic templates
14    std::thread t1(printID, id);
15    std::thread t2(printIDAndName, ++id, "MyString");
16    std::thread t3(printIDAndName, ++id); // this procudes a compiler error
17
18    // wait for threads before returning
19    t1.join();
20    t2.join();
21    //t3.join();
22
23
24    return 0;
25}

Here we can easily pass in our function and an arbitrary number of arguments that match how many our function takes. This is a powerful feature, but we must be aware that by default, data passed in via variadic templates is either moved or copied depending on if it is an lvalue or rvalue. If we are passing an lvalue and we don’t want to pay for a copy, we need to explicitly move it using std::move(). But what if we really want to pass in the variable as a reference that the thread can modify?

C++
 1void printName(std::string &name, int waitTime) {
 2    std::this_thread::sleep_for(std::chrono::milliseconds(waitTime));
 3    name += " (from Thread)";
 4    std::cout << name << std::endl;
 5}
 6
 7int main() {
 8    std::string name("MyThread");
 9
10    // starting thread
11    std::thread t(printName, std::ref(name), 50);
12
13    // wait for thread before returning
14    t.join();
15
16    // print name from main
17    name += " (from Main)";
18    std::cout << name << std::endl;
19
20    return 0;
21}

We need to pass it using std::ref().

So this is great, but C++ often deals with objects, which means we often want to work with the member functions within an object in our thread. We can do that:

C++
 1class Vehicle {
 2public:
 3    Vehicle() : _id(0) {}
 4    void addID(int id) { _id = id; }
 5    void printID() {
 6        std::cout << "Vehicle ID=" << _id << std::endl;
 7    }
 8
 9private:
10    int _id;
11};
12
13int main() {
14    // create thread
15    Vehicle v1, v2;
16    std::thread t1 = std::thread(&Vehicle::addID, v1, 1); // call member function on object v1
17    std::thread t2 = std::thread(&Vehicle::addID, &v2, 2); // call member function on object v2
18
19    // wait for thread to finish
20    t1.join();
21    t2.join();
22
23    // print Vehicle id
24    v1.printID();
25    v2.printID();
26
27    return 0;
28}

This is similar to what we saw above, except we need to use the scope resolution syntax to let the thread know which function to call. Finally, since we are good modern C++ engineers, we will need to know how to pass objects within smart pointers to our threads:

C++
 1int main() {
 2    // create thread
 3    std::shared_ptr<Vehicle> v(new Vehicle);
 4    std::thread t = std::thread(&Vehicle::addID, v, 1); // call member function on object v
 5    
 6    // wait for thread to finish
 7    t.join();
 8    
 9    // print Vehicle id
10    v->printID();
11    
12    return 0;
13}

Finally, we need to go over how multiple threads can be handled:

C++
 1int main() {
 2    // create threads
 3    std::vector<std::thread> threads;
 4    for (size_t i = 0; i < 10; ++i) {
 5        // create new thread from a Lambda
 6        threads.emplace_back([i]() {
 7
 8            // wait for a certain amount of time
 9            std::this_thread::sleep_for(std::chrono::milliseconds(10 * i));
10
11            // perform work
12            std::cout << "Hello from Worker thread #" <<
13
14 i << std::endl;
15        });
16    }
17
18    // do something in main()
19    std::cout << "Hello from Main thread" << std::endl;
20
21    // call join on all thread objects using a range-based loop
22    for (auto &t : threads)
23        t.join();
24
25    return 0;
26}

We can store our threads in a vector and then loop through that vector to call join() on each thread. It should be noted we need to call emplace_back instead of push_back because a thread object does not have a copy constructor and instead needs to be moved, which emplace_back does behind the scenes.

Protecting Data Link to this heading

Concurrent programming is an amazing way to speed up our software; it also is an amazing way to shoot ourselves in the foot. Consider this code example:

C++
 1int counter = 0;
 2void increment_counter(int n) {
 3  for (int i = 0; i < n; i++) {
 4    counter++;
 5  }
 6}
 7int main() {
 8  constexpr auto n = int{100000000};
 9  {
10    auto t1 = std::jthread{increment_counter, n};
11    auto t2 = std::jthread{increment_counter, n};
12  }
13  std::cout << counter << "\n";
14  assert(counter == (n * 2));
15}

This program will likely fail. This is because both threads share the same data and likely will be performing the increment operator at the same time, incrementing by 1 instead of two between the two threads. We can protect ourselves by using std::mutex.

C++
 1int counter = 0;
 2std::mutex counter_mutex = std::mutex{};
 3void increment_counter(int n) {
 4  for (int i = 0; i < n; i++) {
 5    auto lock = std::unique_lock<std::mutex>{counter_mutex};
 6    counter++;
 7  }
 8}
 9int main() {
10  constexpr auto n = int{100000000};
11  {
12    auto t1 = std::jthread{increment_counter, n};
13    auto t2 = std::jthread{increment_counter, n};
14  }
15  std::cout << counter << "\n";
16  assert(counter == (n * 2));
17}

Ok, so what is going on here? Whenever we reach a critical part of our code that could be the source of a data race, we can lock a mutex that, when another thread reaches the mutex, will check to see if it is already locked. If it is, it will wait until the mutex is unlocked before proceeding. This means only one thread will ever be within the section of your code the mutex defines at a time. In my example above, I actually place the mutex within a unique_lock which just acts as our RAII object to ensure we do not forget to unlock our mutex.

But our example above has an issue: deadlocks. Sometimes our program will need to modify more than one mutex at any given time. But this leaves us open to the possibility that two threads are both waiting on each other to release one lock before they release theirs:

C++
 1std::mutex mutex1;
 2std::mutex mutex2;
 3
 4void thread1() {
 5    std::cout << "Thread 1: Locking mutex1" << std::endl;
 6    std::lock_guard<std::mutex> lock1(mutex1);
 7    std::this_thread::sleep_for(std::chrono::milliseconds(100));
 8
 9    std::cout << "Thread 1: Trying to lock mutex2" << std::endl;
10    std::lock_guard<std::mutex> lock2(mutex2);
11
12    std::cout << "Thread 1: Locked both mutexes" << std::endl;
13}
14
15void thread2() {
16    std::cout << "Thread 2: Locking mutex2" << std::endl;
17    std::lock_guard<std::mutex> lock1(mutex2);
18    std::this_thread::sleep_for(std::chrono::milliseconds(100));
19
20    std::cout << "Thread 2: Trying to lock mutex1" << std::endl;
21    std::lock_guard<std::mutex> lock2(mutex1);
22
23    std::cout << "Thread 2: Locked both mutexes" << std::endl;
24}
25
26int main() {
27    std::thread t1(thread1);
28    std::thread t2(thread2);
29
30    t1.join();
31    t2.join();
32
33    return 0;
34}

Here thread1 might be waiting on thread2 to unlock mutex2 and thread2 might be waiting on thread1 to release mutex1. The solution here is to use std::lock() when handling multiple mutexes:

C++
 1void thread1() {
 2    std::cout << "Thread 1: Trying to lock both mutexes" << std::endl;
 3    std::unique_lock<std::mutex> lock1(mutex1, std::defer_lock);
 4    std::unique_lock<std::mutex> lock2(mutex2, std::defer_lock);
 5    std::lock(lock1, lock2);
 6   
 7    std::cout << "Thread 1: Locked both mutexes" << std::endl;
 8}
 9
10void thread2() {
11    std::cout << "Thread 2: Trying to lock both mutexes" << std::endl;
12    std::unique_lock<std::mutex> lock1(mutex1, std::defer_lock);
13    std::unique_lock<std::mutex> lock2(mutex2, std::defer_lock);
14    std::lock(lock1, lock2);
15
16    std::cout << "Thread 2: Locked both mutexes" << std::endl;
17}

Here std::lock() will wait until it can lock both mutexes and then will ensure they are both locked at the same time. One of the great things about unique_lock is you can tell it to not immediately lock until you call .lock() on it by passing the argument std::defer_lock.

Passing Data Between Threads Link to this heading

So far, we have kind of relied on global/shared variables between threads, which is not a great approach. A better way is to have our threads directly communicate information back into the scope that owns the thread. This can be done with promises and futures. Let’s first see how to do this with threads.

C++
 1// Function to perform some computation and set the result in a promise
 2void compute(std::promise<int> result_promise) {
 3    int result = 0;
 4    for (int i = 1; i <= 10; ++i) {
 5        result += i * i; // Example computation: sum of squares from 1 to 10
 6    }
 7    result_promise.set_value(result); // Set the result in the promise
 8}
 9
10int main() {
11    // Create a promise and a future
12    std::promise<int> promise;
13    std::future<int> future = promise.get_future();
14
15    // Launch a thread to perform the computation
16    std::thread t(compute, std::move(promise));
17
18    // Retrieve the result from the future
19    int result = future.get(); // get() waits for the result and retrieves it
20
21    // Join the thread
22    t.join();
23
24    // Output the result
25    std::cout << "The sum of squares from 1 to 10 is: " << result << std::endl;
26
27    return 0;
28}

We need to utilize the std::promise and std::future. We will end up passing the promise into our function (which requires a parameter to accept it), and we hold onto the future provided by promise.get_future() to retrieve data when ready. When our thread is ready to pass data back, it calls set_value() on the promise. Meanwhile, our main function is halted, waiting for future.get() to receive data back. We can now modify this to handle multiple threads:

C++
 1// Function to perform some computation and set the result in a promise
 2void compute(int start, int end, std::promise<int> result_promise) {
 3    int result = 0;
 4    for (int i = start; i < end; ++i) {
 5        result += i * i; // Example computation: sum of squares
 6    }
 7    result_promise.set_value(result); // Set the result in the promise
 8}
 9
10int main() {
11    // Number of threads
12    const int num_threads = 4;
13
14    // Range of values to compute
15    const int range_size = 100;
16    const int step = range_size / num_threads;
17
18    // Vectors to hold the threads and their promises/futures
19    std::vector<std::thread> threads;
20    std::vector<std::promise<int>> promises(num_threads);
21    std::vector<std::future<int>> futures;
22
23    // Launch threads
24    for (int i = 0; i < num_threads; ++i) {
25        int start = i * step;
26        int end = (i + 1) * step;
27        futures.push_back(promises[i].get_future());
28        threads.emplace_back(compute, start, end, std::move(promises[i]));
29    }
30
31    // Handle the results from each thread
32    int total_sum = 0;
33    for (auto& future : futures) {
34        total_sum += future.get(); // get() waits for the result and retrieves it
35    }
36
37    // Join the threads
38    for (auto& thread : threads) {
39        thread.join();
40    }
41
42    std::cout << "Total sum of computations: " << total_sum << std::endl;
43
44    return 0;
45}

Managing promises and futures with threads can quickly become challenging and requires a decent amount of boilerplate to do. We could instead use std::async() which works really well

out of the box with promises and futures.

C++
 1double divideByNumber(double num, double denom) {
 2    if (denom == 0)
 3        throw std::runtime_error("Exception from thread#: Division by zero!");
 4
 5    return num / denom;
 6}
 7
 8int main() {
 9    // use async to start a task
10    double num = 42.0, denom = 2.0;
11    std::future<double> ftr = std::async(divideByNumber, num, denom);
12
13    // retrieve result within try-catch-block
14    try {
15        double result = ftr.get();
16        std::cout << "Result = " << result << std::endl;
17    } catch (std::runtime_error e) {
18        std::cout << e.what() << std::endl;
19    }
20
21    return 0;
22}

std::async takes arguments similar to std::thread (except for an extra optional argument at the beginning) and returns a future which will be the return value of the function. But unlike using threads, we no longer need to construct or manage a std::promise. This simplifies our function used in the thread significantly.

Also, unlike a thread which will always run concurrently, std::async will determine at runtime if its function should run synchronously or asynchronously depending on the current machine resources. Developers can override this by passing std::launch::deferred for synchronous execution or std::launch::async for asynchronous execution. Similar to threads, we can also spin up a bunch of async objects as needed:

C++
 1double divideByNumber(double num, double denom) {
 2    if (denom == 0)
 3        throw std::runtime_error("Exception from thread#: Division by zero!");
 4
 5    return num / denom;
 6}
 7int main() {
 8
 9    std::vector<std::future<void>> futures;
10    int nLoops = 1000, nThreads = 5;
11    for (int i = 0; i < nThreads; ++i) {
12        futures.emplace_back(std::async(divideByNumber, 5000, i));
13    }
14
15    // wait for tasks to complete
16    double final_sum = 0.0;
17    for (const std::future<void> &ftr : futures)
18        final_sum += ftr.get();
19    return 0;
20}

std::async abstracts a lot of the difficulties of using std::thread away but at the cost of a loss of strict control. As a general rule of thumb, if you need performance-critical threading or fine-grained control, threads may be a better solution. Otherwise, use async for simpler things like I/O-bound tasks.

One further example I would like to go over is using a manager object to queue asynchronous operations:

C++
 1class Vehicle {
 2public:
 3    Vehicle(int id) : _id(id) {}
 4    int getID() { return _id; }
 5
 6private:
 7    int _id;
 8};
 9
10class WaitingVehicles {
11public:
12    WaitingVehicles() {}
13
14    bool dataIsAvailable() {
15        std::lock_guard<std::mutex> myLock(_mutex);
16        return !_vehicles.empty();
17    }
18
19    Vehicle popBack() {
20        // perform vector modification under the lock
21        std::lock_guard<std::mutex> uLock(_mutex);
22
23        // remove last vector element from queue
24        Vehicle v = std::move(_vehicles.back());
25        _vehicles.pop_back();
26
27        return v; // will not be copied due to return value optimization (RVO) in C++
28    }
29
30    void pushBack(Vehicle &&v) {
31        // simulate some work
32        std::this_thread::sleep_for(std::chrono::milliseconds(100));
33
34        // perform vector modification under the lock
35        std::lock_guard<std::mutex> uLock(_mutex);
36
37        // add vector to queue
38        std::cout << "   Vehicle #" << v.getID() << " will be added to the queue" << std::endl;
39        _vehicles.emplace_back(std::move(v));
40    }
41
42private:
43    std::vector<Vehicle> _vehicles; // list of all vehicles waiting to enter this intersection
44    std::mutex _mutex;
45};
46
47int main() {
48    // create monitor object as a shared pointer to enable access by multiple threads
49    std::shared_ptr<WaitingVehicles> queue(new WaitingVehicles);
50
51    std::cout << "Spawning threads..." << std::endl;
52    std::vector<std::future<void>> futures;
53    for (int i = 0; i < 10; ++i) {
54        // create a new Vehicle instance and move it into the queue
55        Vehicle v(i);
56        futures.emplace_back(std::async(std::launch::async, &WaitingVehicles::pushBack, queue, std::move(v)));
57    }
58
59    std::cout << "Collecting results..." << std::endl;
60    while (true) {
61        if (queue->dataIsAvailable()) {
62            Vehicle v = queue->popBack();
63            std::cout << "   Vehicle #" << v.getID() << " has been removed from the queue" << std::endl;
64        }
65    }
66
67    std::for_each(futures.begin(), futures.end(), [](std::future<void> &ftr) {
68        ftr.wait();
69    });
70
71    std::cout << "Finished processing queue" << std::endl;
72
73    return 0;
74}

Our data is safer since it is all going through the same WaitingVehicles class which correctly implements a mutex to prevent against data races. It also queues our data to ensure it is handled on our main thread in the correct order.

C++20 Additional Synchronization Primitives Link to this heading

There are three new additions to concurrency in C++20 worth mentioning. The first is Latches. A latch allows for synchronizing multiple threads by providing a point they all must arrive at. It is a glorified decrementing counter where all threads decrement the latch and then wait for the latch to reach zero before moving on.

C++
 1int do_work{ /* ... */}
 2int do_more_work{ /* ... */}
 3int main() {
 4  constexpr auto n_threads = 2;
 5  std::latch initialized = std::latch{n_threads};
 6  std::vector<std::thread> threads = std::vector<std::thread>{};
 7  for (int i = 0; i < n_threads; i++) {
 8    threads.emplace_back( [&]() {
 9      do_work();
10      initialized.arrive_and_wait(); // decrement and wait
11      do_more_work();
12    });
13  }
14  initialized.wait(); // block until this latch is zero
15  for (auto&& t : threads) {
16    t.join();
17  }
18}

Latches also support:

C++
1std::latch l = std::latch{8};
2l.count_down(); // decrement but don't wait until zero
3l.wait(); // don't decrement but wait until zero
4if (l.try_wait()) { // test to see if latch is zero without blocking
5  // latch is zero
6}

The next one is barriers. They are similar to latches but with two additions: a barrier can be reused, and it can run a completion function when it reaches zero. To use it, pass the initial value and our completion function/lambda:

C++
1auto bar = std::barrier{8, []() {
2  std::cout << "All threads arrived at barrier\n";
3}};

Barrier supports all the same functions as a latch, which means when we call bar.arrive_and_wait() 8 times, the message “All threads arrived at barrier” will display. Here is an example of a program that simulates rolling 6 dice and seeing how many times it takes to roll all 6s:

C++
 1int roll_die() {
 2    static thread_local std::mt19937 generator(std::random_device{}());
 3    std::uniform_int_distribution<int> distribution(1, 6);
 4    return distribution(generator);
 5}
 6int main() {
 7  bool done = false;
 8  auto dice = std::array<int, 5>{};
 9  auto threads = std::vector<std::thread>{};
10  auto n_turns = 0;
11
12  // our completion function when barrier reaches zero
13  auto check_result = [&]() {
14    ++n_turns;
15    bool is_six = [](int i) { return i == 6; };
16    done = std::all_of(dice.begin(), dice.end(), is_six);
17  };
18  auto bar = std::barrier{6, check_result};
19
20  for (int i = 0; i < 5; i++) {
21    threads.emplace_back([&,i]() {
22      while (!done) {
23        dice[i] = roll_die();
24        bar.arrive_and_wait();
25      }
26    });
27  }
28  for (auto&& t : threads) {
29    t.join();
30  }
31  std::cout << n_turns << std::endl;
32}

Each time all six dice have been randomly rolled, the barrier completion function is called and we check to see if all six dice are equal to 6. If they are, we set our done variable to true and finish up. Otherwise, it should be noted that the barrier will automatically reset to its initial number after running the completion function. This means we can iterate again over all the dice threads and try again.

Finally, let’s review semaphores. Semaphores in C++ can be used to signal different states between different threads while controlling access to critical data. In the example below, a counting semaphore allows for 5 different threads to access a function at the same time, blocking all other threads until a spot opens up:

C++
 1class Server {
 2public: 
 3  void handle(const Request& req) {
 4    sem_.acquire(); // blocks until 1 of 5 threads is available. 
 5    do_handle(req);
 6    sem_.release
 7
 8(); // decrements so another thread can call do_handle
 9  }
10private:
11  void do_handle(const Request& req) { /* ... */ }
12  std::counting_semaphore<5> sem_{5};
13};

Below is an example of a bounded buffer that has multiple threads reading (pop()) and writing (push()) to the buffer. To prevent data races, we will use two semaphores to synchronize our threads properly.

C++
 1template <typename T, int N>
 2class BoundedBuffer {
 3  std::array<T, N> buf_;
 4  std::size_t read_pos_{};
 5  std::size_t write_pos_{};
 6  std::mutex m_;
 7  std::counting_semaphore<N> n_empty_slots{N};
 8  std::counting_semaphore<N> n_full_slots{0};
 9
10  void do_push(auto&& item) {
11    n_empty_slots_.acquire(); // There is one less slot. Any other thread will need to wait if all empty slots are taken (== 0)
12    try {
13      auto lock = std::unique_lock{m_}; // use a mutex to lock our data
14      buf_[write_pos_] = std::forward<decltype(item)>(item);
15      write_pos_ = (write_pos_ + 1) % N;
16    } catch (...) {
17      n_empty_slots_.release(); // we failed to add to the buffer, so empty slots should be reset
18      throw;
19    }
20    n_full_slots_.release(); // everything was successful, we now have one more full slot
21  }
22public:
23  void push(const T& item) { do_push(item); }
24  void push(T&& item) { do_push(std::move(item)); }
25  auto pop() {
26    n_full_slots_.acquire(); // Check if there is any data, if so remove one and all other threads must wait for data to be added if at zero
27    auto item = std::optional<T>{};
28    try {
29      auto lock = std::unique_lock{m_};
30      item = std::move(buf_[read_pos_]);
31      read_pos_ = (read_pos_ + 1) % N;
32    } catch (...) {
33      n_full_slots_.release();
34      throw;
35    }
36    n_empty_slots_.release();
37    return std::move(*item);
38  }
39};

Using two semaphores, we are able to synchronize all of our threads so that we are never trying to pop() an empty buffer or add to a full buffer. This is challenging since our threads will run non-deterministically.

There is one more major aspect to C++ concurrency that I want to handle, which is atomics. But I suspect there is enough content regarding atomics that it can be its own post. Also, this post is getting rather long, so I think we can end here.

Conclusion Link to this heading

Our processors will, and already have started to, hit a wall as far as how fast they can sequentially process information. But our need for faster programs certainly will not acquiesce to this limiting factor. Instead, it is up to engineers to figure out how to utilize multiple threads and concurrent processes to speed up our programs. This is not an easy task by any means, but it is an important one. To the C++ engineer, we have a lot of great tools in our toolbelt to make this happen. Writing robust concurrent software in C++ is doable, and you no longer need ice in your veins to make the attempt. But you must make sure you are using the right tools and that you understand them.

tot manus noctis opus