Microbenchmarking Link to this heading

Microbenchmarking is used when attempting to optimize a function or small block of code. This involves creating a program that runs the code to be optimized in isolation from the rest of the program. The process follows these steps:

  1. Find a hot spot that needs tuning, preferably using a profiler.
  2. Seperate it from teh rest of teh code and create the isolated benchmark
  3. Optimize the microbenchmark using a benchmarking framework.
  4. Integrate the newly optimized code into the program and measure again to see if the optimization is relevant when the code runs in the context of the full program.

Measuring the speed up from microbenchmarking can be done with the help of *Amdahl’s law

Speedup = 1 / ((1 - P) + (P / N))


  • Speedup is the overall speedup of the system.
  • P is the proportion of the task that can be parallelized.
  • N is the number of processors.

Setting p = 0 and s = 5x means the part we optimized had no impact on the overall execution time. Setting p = 0.01, s = 2x means we have an improvement of 1.005.

Example in C++ Link to this heading

The following example was done using Google Benchmark (https://github.com/google/benchmark)

  1. I first cloned the benchmark to my local machine
git clone https://github.com/google/benchmark.git
  1. I navigate to the directory and run these commands to build the framework and install it into a C++ library directory I keep on my local machine
cd benchmark
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/Desktop/Libraries
make install
  1. Next I started a new C++ project to consisting of two files (CMakeLists.txt and main.cpp). My CMakeLists.txt was very basic:
 1cmake_minimum_required(VERSION 3.0)
 2project(benchmark-test) # I was sure to set my CMake path to -DCMAKE_INSTALL_PREFIX=$HOME/Desktop/Libraries
 6find_package(benchmark REQUIRED)
 8add_executable(benchmark_test main.cpp)
10target_link_libraries(benchmark_test benchmark::benchmark)
  1. Lastly was writting my C++ file for benchmarking. For this example I benchmarked a linear and binary search algorithm for a vector. The two functions I am testing (and microbenchmarking against to optimize)
 1bool linear_search(const std::vector<int>& v, int key) {
 2    for (size_t i = 0; i < v.size(); i ++) {
 3        if (v[i] == key) {
 4            return true;
 5        }
 6    }
 7    return false;
10bool binary_search(const std::vector<int>& v, int key) {
11    int low = 0;
12    int high = v.size() - 1;
13    while (low <= high) {
14        const int mid = (high + low) / 2;
15        if (v[mid] < key) {
16            low = mid + 1;
17        } else if (v[mid] > key) {
18            high = mid - 1;
19        } else {
20            return true;
21        }
22    }
23    return false;

I have this function to build out a vector for testing:

1auto gen_vec(int n) {
2    std::vector<int> v;
3    for (int i = 0; i < n; i++) {
4        v.push_back(i);
5    }
6    return v;

Next I need to build out another function for each piece of code I am testing. This function needs to be static and should take a bechmark::State& parameter. The state will tell me how large the size of the array will be based on parameters I give the framework later using the range() function. Run a for loop through the state and tell the benchmark not to optimize. Finally we will have the framework guess the time complexity by calling the set SetComplexityN() function on the state.

 1static void bm_linear_search(benchmark::State& state) {
 2    auto n = state.range(0);
 3    auto v = gen_vec(n);
 5    for (auto _ : state) {
 6        benchmark::DoNotOptimize(linear_search(v, n));
 7    }
 9    state.SetComplexityN(n);
12static void bm_binary_search(benchmark::State& state) {
13    auto n = state.range(0);
14    auto v = gen_vec(n);
16    for (auto _ : state) {
17        benchmark::DoNotOptimize(binary_search(v, n));
18    }
20    state.SetComplexityN(n);

When microbenchmarking, you will not have a main function. Instead pass the functions you will be testing into a BENCHMARK function and then set BENCHMARK_MAIN(). You will see by setting the RangeMultiplier we will get the benchmark to test with a bunch of different input sizes.

  1. Once complete, I run:
cmake .

This will compile and run the program. The results on my machine looked like this:

Example Image

Not surprisingly, a binary search is significantly faster. On a vector with 4096 items, it took the linear search function 15021 nanoseconds and only 67 nanoseconds for the binary search. Information about the Big O is also included which can be very helpful for more complex algorithms.

There is obviously a lot more that can be said about microbenchmarking and more that can be done with Google’s framework. But what has been said above is enough to hit the ground running.