# Double-ended vector - is it useful?

22 May 2016

I have frequently seen the question ‘How do I push an element to the front of a C++ vector?’. Usually the correct answer is to use deque instead, and that’s the end of it. But what if you also want contiguous storage? In this post I present a simple implementation of a double-ended vector, devector, and compare the performance to vector, deque, and a circular buffer.

Basic Concept

A std::vector is typically implemented as a memory buffer of a certain size, which is partially filled up with elements, beginning at the start of the buffer. The distance from the last element to the end of the buffer is the spare capacity of the vector, and determines how many elements we can insert before we need to reallocate. The fact that we only have spare capacity in the back end of the vector, means that we can only efficiently insert at the back (or close to it).

The most important change we need to make to allow for fast insertion at both ends, is to allow the devector to have spare capacity at both ends. This means we keep a pointer to the start of the elements, which can be different from the start of the underlying buffer. As long as there is spare capacity at both ends of the buffer, we can insert at both ends quickly.

Reaching the end

Inevitably, we will use up all spare capacity on one side of the buffer, and then how do we handle the next insertion? For std::vector this is trivial, when we reach the back of the buffer we have to reallocate, because there is never room at the front end. For devector, however, we have a few options when the other side of the buffer still has spare capacity:

1. Move elements one step toward the other side. This is equal to how front insertions are handled in std::vector, and is very slow. Using this strategy would mean that the devector would devolve to have $O(n)$ insertions at one end.
2. Move elements several steps towards the other side. This is more interesting, because of the nice asymptotic properties. The initial insert is still slow, but now the following inserts will be fast again. If we move the the elements such that the remaining spare capacity is equal on both sides, we only need to do this $O(\log n)$ times before the buffer is completely full. This is because each time we fill up on one side and move elements, the spare capacity is cut in half. Since we start with $O(n)$ spare capacity after the previous reallocation, it takes $O(\log n)$ times to get down to a constant amount of spare capacity.
3. Reallocating, and placing the elements in the middle of a new buffer. This is fast for a growing devector, because we don’t waste time moving elements back and forth in the same small buffer. It is bad for memory usage, though, because we reallocate to a larger buffer, even though we haven’t filled up the current buffer. It is also bad if we are pushing elements to one side, and popping them from the other, because then reaching the end doesn’t mean we need more space, it just means the elements should be moved.

The best option is a combination of 2 and 3. When there is still a lot of room left in the buffer, we should move elements toward the middle, and not reallocate straight away. However, as the buffer fills up, the gains from moving elements are diminishing, while the cost is increasing. Moving $O(n)$ elements $O(\log n)$ times takes $O(n \cdot \log n)$ work, and amortized on $O(N)$ inserts we get $O(\log n)$ time per element, which is unacceptable. A nice solution is to move elements if the size-to-capacity ratio is less than some limit $\alpha$, and reallocate if the ratio is larger.

I will now show that this tactic leads to amortized constant time inserts. After the previous reallocation, the devector is $\frac{\alpha}{\beta}$ full, where $\beta$ is the growth factor used for reallocations. For example, if the limit is 90% and the growth factor is 2, the devector will be approximately 45% full after a reallocation. The spare capacity right after the reallocation is $1-\frac{\alpha}{\beta}$ (65% in the example). If every move operation cuts the spare capacity in two, it takes $\log_2 (1-\frac{\alpha}{\beta}) / (1 - \alpha)$ move operations before we reach the limit again and should do a new reallocation. Since $\alpha$ and $\beta$ are constants, the number of move operations is constant. Moving $O(n)$ elements $O(1)$ times takes $O(n)$ work, and since it is amortized over $\Theta(n)$ inserts, we get $O(1)$ amortized time per insert.

Adapting to the input

It’s great to have $O(1)$ insertion time at both ends, but the constant factors can still be high, as we must move all elements several times between each reallocation. The problem is that no matter how the devector is used, we treat insertions at both ends as equally likely, so we try to keep equal amounts of spare capacity on both ends. What if we assumed that previous behaviour predicts future behaviour?. That’s often a reasonably assumption. If we keep track of how many insertions are done at both ends, we can give more spare capacity to the active end, thus cutting down on (or totally getting rid of) the move operations. As long as we always give $\Theta(n)$ spare capacity to both ends, the analysis above still holds.

For example, if we have reached the back end of the buffer after a series of push_backs, and the buffer is 70% percent full, an unadaptive tactic would give 15% spare capacity to both sides. We can now push_back 15% more elements until we reach the limit again. The buffer is now 85% full, and we need to move the elements again (assuming the reallocation limit is 90%).

If we use an adaptive tactic, we could give the front 5% spare capacity, and give 25% to the back. Then we can push_back 25% more before reaching the end. Now the buffer is 95% full, and we go straight to reallocation, without an additional move operation in between.

Code

The code for devector as well as the performance measurements is available at github/cppwhiletrue. There are probably still bugs in the code, but I think it’s correct enough for the measurements to be correct. I apologize for the non-portable time measurement code, but it was the simplest way to get high resolution timing with g++/mingw.

Results

All results reported are the median of 51 runs. The performance tests are run on an Intel Core i7-4600U @2.1GHz. I report results both from MSVC compiler and g++/mingw. The circular data structure used in the tests are a slightly adapted boost::circular_buffer_space_optimized.

Push back

In this test, I measured the time taken to push_back $N$ random integers into each of the structures. The performance of devector looks good for back insertion, only beaten by the libstdc++ version of std::deque. The difference between libstdc++ and msvc std::deque is likely due to the small node size used by msvc.

#### integer push back - g++/mingw (ns)

N vector devector deque circular
10 354.891 270.996 67.1919 319.998
20 420.227 349.695 98.1895 784.031
40 550.898 500.414 141.809 605.844
80 678.602 593.961 160.369 893.914
160 950.336 795.906 347.469 1395.81
320 1413.62 1078.04 501.898 2292.7
640 2381.8 1900.66 1173.08 4074.59
1280 4038.94 2886.66 2061.05 7460.19
2560 7555.19 5250.62 4110.22 14207.6
5120 14350.1 13566.1 8790.62 27750
10240 28985.2 22903.2 16821 54359.5
20480 55309.5 32786.8 32311.5 107578
40960 110619 64813 62532.5 249750
81920 222000 133428 125825 429935
163840 561082 258113 243287 911189
327680 1.3358e+006 710096 498740 2.05426e+006
655360 3.02285e+006 1.99458e+006 1.11266e+006 4.65858e+006

#### integer push back - msvc (ns)

N vector devector deque circular
10 634.055 349.695 404.637 452.895
20 813.727 421.715 540.508 586.539
40 977.07 549.414 953.313 813.73
80 1306.72 787 1787.83 1137.44
160 1823.47 1098.83 3397.47 1758.13
320 2696.59 1496.78 5785.19 2619.38
640 3254.91 2827.25 11214 4597.28
1280 5725.81 4288.41 21905.4 9289.56
2560 10667.6 7578.94 44286 16250.8
5120 17201 14302.6 90092 31931.5
10240 33737 27940 174483 60631.5
20480 69945 54739.5 334901 119743
40960 145972 107959 707435 236825
81920 251650 236445 1.39282e+06 472510
163840 537894 471750 2.83849e+06 1.07123e+06
327680 1.90829e+06 1.23963e+06 5.72069e+06 2.53476e+06
655360 3.72991e+06 3.06619e+06 1.22613e+07 5.19382e+06

Push front

In this test, I measured the time taken to push_front $N$ random integers into each of the structures. The performance of devector also looks good for front insertion, again beaten by the libstdc++ version of std::deque.

#### integer push front - g++/mingw (ns)

N devector deque circular
10 417.258 109.512 348.953
20 537.531 102.457 415.773
40 687.512 141.066 576.145
80 662.266 276.564 882.031
160 920.641 351.922 1401.75
320 1265.14 589.508 2304.56
640 2001.66 1134.47 4145.84
1280 2981.69 2132.33 7483.94
2560 5630.75 4419.06 14255.1
5120 13970 10501.2 27750
10240 17676.2 20194.8 63482.5
20480 33071 34402.2 108339
40960 90092 67284 215537
81920 130387 129626 429935
163840 280161 269897 970870
327680 829079 572867 2.21468e+006
655360 2.20442e+006 1.23126e+006 4.86728e+006

#### integer push front - msvc (ns)

N devector deque circular
10 472.199 345.24 403.895
20 543.477 513.773 574.66
40 721.664 837.484 751.363
80 885 1633.39 1155.26
160 1223.56 3433.09 1645.28
320 1651.22 5904 2815.39
640 2922.28 12592 4846.75
1280 4822.97 23948.6 7982.88
2560 8363 48657.5 16678.5
5120 15965.5 97315 32026.5
10240 28700 187407 62722.5
20480 59681 349726 112140
40960 109099 716939 221239
81920 231883 1.35633e+06 484675
163840 427274 2.76398e+06 1.13243e+06
327680 1.43768e+06 6.00009e+06 2.37434e+06
655360 3.08025e+06 1.26753e+07 5.2037e+06

Push back / Pop front

In this test, I pushed $N$ elements to each structure, and then measured the time taken to pop_front and push_back $N$ random integers. This test emulates the steady state operation of a queue.

#### integer push/pop - g++/mingw (ns)

N devector deque circular
10 118.791 69.7908 48.6299
20 246.494 78.3286 106.171
40 201.205 118.05 203.432
80 295.496 374.195 394.984
160 662.266 687.512 1404.72
320 974.094 1603.7 1419.56
640 1728.43 3017.31 4965.5
1280 3243.03 3777.59 5630.75
2560 6319.75 7507.69 11261.5
5120 11451.6 20622.4 42195
10240 27179.8 44571 86291
20480 53979.5 90092.5 89712
40960 96174 115941 180565
81920 261534 233784 363031
163840 496079 481253 833261
327680 1.06666e+006 991018 1.63877e+006
655360 2.4268e+006 2.04628e+006 3.63982e+006

#### integer push/pop - msvc (ns)

N devector deque circular
10 112.853 194.151 98.375
20 132.899 333.361 205.66
40 206.402 590.992 392.758
80 296.238 1155.26 714.242
160 507.836 2506.52 1380.96
320 1042.41 4977.41 2732.22
640 1657.16 9099.5 5452.59
1280 3290.56 19719.6 10881.4
2560 6581.13 39629.3 25089
5120 13209.8 72986 46756.8
10240 27464.8 158137 89712
20480 54929.5 304490 191209
40960 109479 612401 611641
81920 246709 1.27916e+06 754952
163840 465668 2.50739e+06 1.52131e+06
327680 988357 5.41734e+06 3.12435e+06
655360 2.23863e+06 1.1108e+07 6.1396e+06

Summary

If you are looking for a std::vector-like structure with ability to insert/erase at both ends, or need a double-ended queue structure with contiguous storage, I think a double-ended vector is a good alternative.

I think a nice use case is if you need to interact with low-level api’s which takes contiguous ranges, and you don’t want to take the cost of copying the elements to a std::vector before you call the api.

Related Material

Update: When I wrote this post, I was sure someone had already made similar things, as the basic idea is pretty simple. Here are a couple examples (Please point me to others):

• QList, used by Qt, uses the same basic idea.
• Orson Peters implemented a very similar structure, also called devector, which can be found here.