Part of the Design Patterns series:

Introduction Link to this heading

C++ Engineers like optimization techniques; it is in our blood. An interesting, although slightly counterintuitive technique is small buffer optimization. One of our goals should be to program on the stack using value semantics anytime it makes sense. But the reality is we often are required to allocate our data onto the heap for numerous reasons. It is often necessary, but unfortunate since heap allocated data comes with a performance penalty. For containers that need to store data, small buffer optimization works to try and mitigate this performance hit by allocating a small amount of stack memory that will be used instead of heap memory when possible. When only a small amount of data is stored within the container, its stack allocated memory is used, otherwise it will store the data in the heap.

I say this is slightly unintuitive because much of what C++ engineers do to optimize their code base involves reducing the memory footprint of their containers. A container with a smaller memory footprint will introduce far less cache misses during runtime and can often lead to very significant performance improvements. But with small buffer optimization, we instead increase the memory size of our container with the knowledge that this extra memory might not even be used!

Implementation Link to this heading

Let’s dive into how this gets implemented. We are going to implement a small string class with buffer optimization. If the string it owns is less than or equal to 15 characters, we will store it locally in a stack-allocated array, otherwise, it will be stored in the heap using pointer semantics. For small strings, we should expect a major optimization at runtime since we do not need to allocate memory on the heap (expensive) or fetch the memory from heap when needed (also expensive). Instead, for these small strings, the memory is already allocated at the start of the program and is retrieved much faster.

We could increase the size of the local buffer to try and capture more use cases, but what we will find is that there will start to be a degradation in performance as the size of the class increases due to the larger buffer. The larger size means more cache misses which might perform worse than grabbing data from the heap. Since heap data only adds the size of the pointer (usually 64 bits) into the class memory size, the entirety of the class might be more cache friendly and perform better (even with the need to use heap memory).

C++
 1class SmallString {
 2private:
 3    static constexpr size_t MAX_SIZE = 15; // Maximum size for the small buffer
 4    size_t size;
 5    char* data;
 6    char buffer[MAX_SIZE + 1]; // Static buffer +1 for null terminator
 7
 8public:
 9    // Constructor
10    SmallString(const char* str) : size(strlen(str)) {
11        if (size <= MAX_SIZE) {
12            // Use the internal buffer
13            data = buffer;
14        } else {
15            // Allocate memory dynamically if size exceeds MAX_SIZE
16            data = new char[size + 1];
17        }
18        strcpy(data, str);
19    }
20    // Copy constructor
21    SmallString(const SmallString& other) : size(other.size) {
22        if (size <= MAX_SIZE) {
23            data = buffer;
24        } else {
25            data = new char[size + 1];
26        }
27        strcpy(data, other.data);
28    }
29
30    // Destructor
31    ~SmallString() {
32        if (size > MAX_SIZE) {
33            delete[] data;
34        }
35    }
36
37    // Helper function to display string
38    void print() const {
39        std::cout << "String: " << data << " (size: " << size << ")" << std::endl;
40    }
41};
42
43int main() {
44    SmallString s1("Hello");
45    SmallString s2("This is a very long string that does not fit in the buffer");
46
47    s1.print();
48    s2.print();
49
50    return 0;
51}

Even with the benefits of SBO, it is unfortunate to have so many use cases where memory is allocated for the object that might never be used. Very wasteful. There is a way to work around this through the use of a union. A union in C++ is a user-defined datatype that can define different types of data within it, similar to a struct. But, only one member variable within the union can store data at any given time. Using a union allows us to use the same memory for both our local buffer and our pointer to the heap which is great, since without the SBO we need to store the pointer anyway. This essentially means we get our local buffer for free!

C++
 1class SmallString {
 2private:
 3    static constexpr size_t MAX_SIZE = 15; // Maximum size for the small buffer
 4    union {
 5        char* data;  // Pointer to the dynamically allocated buffer for large strings
 6        char buffer[MAX_SIZE + 1];  // Inline buffer for small strings
 7    };
 8    size_t size;
 9
10    bool isSmall() const { return size <= MAX_SIZE; }
11
12public:
13    // Constructor
14    SmallString(const char* str
15
16) : size(strlen(str)) {
17        if (isSmall()) {
18            // Use the small buffer
19            memcpy(buffer, str, size + 1);
20        } else {
21            // Allocate memory for large strings
22            data = new char[size + 1];
23            memcpy(data, str, size + 1);
24        }
25    }
26
27    // Destructor
28    ~SmallString() {
29        if (!isSmall()) {
30            delete[] data;
31        }
32    }
33
34    // Copy constructor
35    SmallString(const SmallString& other) : size(other.size) {
36        if (isSmall()) {
37            memcpy(buffer, other.buffer, size + 1);
38        } else {
39            data = new char[size + 1];
40            memcpy(data, other.data, size + 1);
41        }
42    }
43
44    // Assignment operator
45    SmallString& operator=(const SmallString& other) {
46        if (this == &other) return *this;
47
48        if (!isSmall()) {
49            delete[] data;
50        }
51
52        size = other.size;
53        if (isSmall()) {
54            memcpy(buffer, other.buffer, size + 1);
55        } else {
56            data = new char[size + 1];
57            memcpy(data, other.data, size + 1);
58        }
59        return *this;
60    }
61
62    // Print function for demonstration
63    void print() const {
64        std::cout << (isSmall() ? buffer : data) << std::endl;
65    }
66};
67
68int main() {
69    SmallString s1("Hello");
70    SmallString s2("This is a very long string, definitely not small!");
71    s1.print();
72    s2.print();
73    return 0;
74}

If the SmallString instantiation is larger than 15 characters, the data variable will be used within the union, otherwise the buffer variable will be used. In both cases, the union will take up the same memory footprint at the largest item in the union, which in this case is likely buffer at 16 bytes. So in this case, we get either the pointer for free.

Conclusion Link to this heading

This optimization is great, when used correctly. In both my examples above, the SmallString class is still larger than it otherwise would have been had I not done SBO. This is fine if we have enough instantiations that utilize the small buffer. But if every instantiation is so large that it must go on the heap, we are bloating our objects for no apparent reason. It is always important to test optimizations to ensure they actually help and do not in fact slow things down.