I have frequently seen the question ‘How do I push an element to the front of a C++ vector?’. Usually the correct answer is to use deque instead, and that’s the end of it. But what if you also want contiguous storage? In this post I present a simple implementation of a double-ended vector, devector
, and compare the performance to vector, deque, and a circular buffer.
Basic Concept
A std::vector
is typically implemented as a memory buffer of a certain size, which is partially filled up with elements, beginning at the start of the buffer. The distance from the last element to the end of the buffer is the spare capacity of the vector, and determines how many elements we can insert before we need to reallocate. The fact that we only have spare capacity in the back end of the vector, means that we can only efficiently insert at the back (or close to it).
The most important change we need to make to allow for fast insertion at both ends, is to allow the devector
to have spare capacity at both ends. This means we keep a pointer to the start of the elements, which can be different from the start of the underlying buffer. As long as there is spare capacity at both ends of the buffer, we can insert at both ends quickly.
Reaching the end
Inevitably, we will use up all spare capacity on one side of the buffer, and then how do we handle the next insertion? For std::vector
this is trivial, when we reach the back of the buffer we have to reallocate, because there is never room at the front end. For devector
, however, we have a few options when the other side of the buffer still has spare capacity:
- Move elements one step toward the other side. This is equal to how front insertions are handled in
std::vector
, and is very slow. Using this strategy would mean that thedevector
would devolve to have insertions at one end. - Move elements several steps towards the other side. This is more interesting, because of the nice asymptotic properties. The initial insert is still slow, but now the following inserts will be fast again. If we move the the elements such that the remaining spare capacity is equal on both sides, we only need to do this times before the buffer is completely full. This is because each time we fill up on one side and move elements, the spare capacity is cut in half. Since we start with spare capacity after the previous reallocation, it takes times to get down to a constant amount of spare capacity.
- Reallocating, and placing the elements in the middle of a new buffer. This is fast for a growing
devector
, because we don’t waste time moving elements back and forth in the same small buffer. It is bad for memory usage, though, because we reallocate to a larger buffer, even though we haven’t filled up the current buffer. It is also bad if we are pushing elements to one side, and popping them from the other, because then reaching the end doesn’t mean we need more space, it just means the elements should be moved.
The best option is a combination of 2 and 3. When there is still a lot of room left in the buffer, we should move elements toward the middle, and not reallocate straight away. However, as the buffer fills up, the gains from moving elements are diminishing, while the cost is increasing. Moving elements times takes work, and amortized on inserts we get time per element, which is unacceptable. A nice solution is to move elements if the size-to-capacity ratio is less than some limit , and reallocate if the ratio is larger.
I will now show that this tactic leads to amortized constant time inserts. After the previous reallocation, the devector
is full, where is the growth factor used for reallocations. For example, if the limit is 90% and the growth factor is 2, the devector
will be approximately 45% full after a reallocation. The spare capacity right after the reallocation is (65% in the example). If every move operation cuts the spare capacity in two, it takes move operations before we reach the limit again and should do a new reallocation. Since and are constants, the number of move operations is constant. Moving elements times takes work, and since it is amortized over inserts, we get amortized time per insert.
Adapting to the input
It’s great to have insertion time at both ends, but the constant factors can still be high, as we must move all elements several times between each reallocation. The problem is that no matter how the devector
is used, we treat insertions at both ends as equally likely, so we try to keep equal amounts of spare capacity on both ends. What if we assumed that previous behaviour predicts future behaviour?. That’s often a reasonably assumption. If we keep track of how many insertions are done at both ends, we can give more spare capacity to the active end, thus cutting down on (or totally getting rid of) the move operations. As long as we always give spare capacity to both ends, the analysis above still holds.
For example, if we have reached the back end of the buffer after a series of push_back
s, and the buffer is 70% percent full, an unadaptive tactic would give 15% spare capacity to both sides. We can now push_back
15% more elements until we reach the limit again. The buffer is now 85% full, and we need to move the elements again (assuming the reallocation limit is 90%).
If we use an adaptive tactic, we could give the front 5% spare capacity, and give 25% to the back. Then we can push_back
25% more before reaching the end. Now the buffer is 95% full, and we go straight to reallocation, without an additional move operation in between.
Code
The code for devector
as well as the performance measurements is available at github/cppwhiletrue. There are probably still bugs in the code, but I think it’s correct enough for the measurements to be correct. I apologize for the non-portable time measurement code, but it was the simplest way to get high resolution timing with g++/mingw.
Results
All results reported are the median of 51 runs. The performance tests are run on an Intel Core i7-4600U @2.1GHz. I report results both from MSVC compiler and g++/mingw. The circular
data structure used in the tests are a slightly adapted boost::circular_buffer_space_optimized
.
Push back
In this test, I measured the time taken to push_back
random integers into each of the structures. The performance of devector
looks good for back insertion, only beaten by the libstdc++ version of std::deque
. The difference between libstdc++ and msvc std::deque
is likely due to the small node size used by msvc.
integer push back - g++/mingw (ns)
N | vector | devector | deque | circular |
---|---|---|---|---|
10 | 354.891 | 270.996 | 67.1919 | 319.998 |
20 | 420.227 | 349.695 | 98.1895 | 784.031 |
40 | 550.898 | 500.414 | 141.809 | 605.844 |
80 | 678.602 | 593.961 | 160.369 | 893.914 |
160 | 950.336 | 795.906 | 347.469 | 1395.81 |
320 | 1413.62 | 1078.04 | 501.898 | 2292.7 |
640 | 2381.8 | 1900.66 | 1173.08 | 4074.59 |
1280 | 4038.94 | 2886.66 | 2061.05 | 7460.19 |
2560 | 7555.19 | 5250.62 | 4110.22 | 14207.6 |
5120 | 14350.1 | 13566.1 | 8790.62 | 27750 |
10240 | 28985.2 | 22903.2 | 16821 | 54359.5 |
20480 | 55309.5 | 32786.8 | 32311.5 | 107578 |
40960 | 110619 | 64813 | 62532.5 | 249750 |
81920 | 222000 | 133428 | 125825 | 429935 |
163840 | 561082 | 258113 | 243287 | 911189 |
327680 | 1.3358e+006 | 710096 | 498740 | 2.05426e+006 |
655360 | 3.02285e+006 | 1.99458e+006 | 1.11266e+006 | 4.65858e+006 |
integer push back - msvc (ns)
N | vector | devector | deque | circular |
---|---|---|---|---|
10 | 634.055 | 349.695 | 404.637 | 452.895 |
20 | 813.727 | 421.715 | 540.508 | 586.539 |
40 | 977.07 | 549.414 | 953.313 | 813.73 |
80 | 1306.72 | 787 | 1787.83 | 1137.44 |
160 | 1823.47 | 1098.83 | 3397.47 | 1758.13 |
320 | 2696.59 | 1496.78 | 5785.19 | 2619.38 |
640 | 3254.91 | 2827.25 | 11214 | 4597.28 |
1280 | 5725.81 | 4288.41 | 21905.4 | 9289.56 |
2560 | 10667.6 | 7578.94 | 44286 | 16250.8 |
5120 | 17201 | 14302.6 | 90092 | 31931.5 |
10240 | 33737 | 27940 | 174483 | 60631.5 |
20480 | 69945 | 54739.5 | 334901 | 119743 |
40960 | 145972 | 107959 | 707435 | 236825 |
81920 | 251650 | 236445 | 1.39282e+06 | 472510 |
163840 | 537894 | 471750 | 2.83849e+06 | 1.07123e+06 |
327680 | 1.90829e+06 | 1.23963e+06 | 5.72069e+06 | 2.53476e+06 |
655360 | 3.72991e+06 | 3.06619e+06 | 1.22613e+07 | 5.19382e+06 |
Push front
In this test, I measured the time taken to push_front
random integers into each of the structures. The performance of devector
also looks good for front insertion, again beaten by the libstdc++ version of std::deque
.
integer push front - g++/mingw (ns)
N | devector | deque | circular |
---|---|---|---|
10 | 417.258 | 109.512 | 348.953 |
20 | 537.531 | 102.457 | 415.773 |
40 | 687.512 | 141.066 | 576.145 |
80 | 662.266 | 276.564 | 882.031 |
160 | 920.641 | 351.922 | 1401.75 |
320 | 1265.14 | 589.508 | 2304.56 |
640 | 2001.66 | 1134.47 | 4145.84 |
1280 | 2981.69 | 2132.33 | 7483.94 |
2560 | 5630.75 | 4419.06 | 14255.1 |
5120 | 13970 | 10501.2 | 27750 |
10240 | 17676.2 | 20194.8 | 63482.5 |
20480 | 33071 | 34402.2 | 108339 |
40960 | 90092 | 67284 | 215537 |
81920 | 130387 | 129626 | 429935 |
163840 | 280161 | 269897 | 970870 |
327680 | 829079 | 572867 | 2.21468e+006 |
655360 | 2.20442e+006 | 1.23126e+006 | 4.86728e+006 |
integer push front - msvc (ns)
N | devector | deque | circular |
---|---|---|---|
10 | 472.199 | 345.24 | 403.895 |
20 | 543.477 | 513.773 | 574.66 |
40 | 721.664 | 837.484 | 751.363 |
80 | 885 | 1633.39 | 1155.26 |
160 | 1223.56 | 3433.09 | 1645.28 |
320 | 1651.22 | 5904 | 2815.39 |
640 | 2922.28 | 12592 | 4846.75 |
1280 | 4822.97 | 23948.6 | 7982.88 |
2560 | 8363 | 48657.5 | 16678.5 |
5120 | 15965.5 | 97315 | 32026.5 |
10240 | 28700 | 187407 | 62722.5 |
20480 | 59681 | 349726 | 112140 |
40960 | 109099 | 716939 | 221239 |
81920 | 231883 | 1.35633e+06 | 484675 |
163840 | 427274 | 2.76398e+06 | 1.13243e+06 |
327680 | 1.43768e+06 | 6.00009e+06 | 2.37434e+06 |
655360 | 3.08025e+06 | 1.26753e+07 | 5.2037e+06 |
Push back / Pop front
In this test, I pushed elements to each structure, and then measured the time taken to pop_front
and push_back
random integers. This test emulates the steady state operation of a queue.
integer push/pop - g++/mingw (ns)
N | devector | deque | circular |
---|---|---|---|
10 | 118.791 | 69.7908 | 48.6299 |
20 | 246.494 | 78.3286 | 106.171 |
40 | 201.205 | 118.05 | 203.432 |
80 | 295.496 | 374.195 | 394.984 |
160 | 662.266 | 687.512 | 1404.72 |
320 | 974.094 | 1603.7 | 1419.56 |
640 | 1728.43 | 3017.31 | 4965.5 |
1280 | 3243.03 | 3777.59 | 5630.75 |
2560 | 6319.75 | 7507.69 | 11261.5 |
5120 | 11451.6 | 20622.4 | 42195 |
10240 | 27179.8 | 44571 | 86291 |
20480 | 53979.5 | 90092.5 | 89712 |
40960 | 96174 | 115941 | 180565 |
81920 | 261534 | 233784 | 363031 |
163840 | 496079 | 481253 | 833261 |
327680 | 1.06666e+006 | 991018 | 1.63877e+006 |
655360 | 2.4268e+006 | 2.04628e+006 | 3.63982e+006 |
integer push/pop - msvc (ns)
N | devector | deque | circular |
---|---|---|---|
10 | 112.853 | 194.151 | 98.375 |
20 | 132.899 | 333.361 | 205.66 |
40 | 206.402 | 590.992 | 392.758 |
80 | 296.238 | 1155.26 | 714.242 |
160 | 507.836 | 2506.52 | 1380.96 |
320 | 1042.41 | 4977.41 | 2732.22 |
640 | 1657.16 | 9099.5 | 5452.59 |
1280 | 3290.56 | 19719.6 | 10881.4 |
2560 | 6581.13 | 39629.3 | 25089 |
5120 | 13209.8 | 72986 | 46756.8 |
10240 | 27464.8 | 158137 | 89712 |
20480 | 54929.5 | 304490 | 191209 |
40960 | 109479 | 612401 | 611641 |
81920 | 246709 | 1.27916e+06 | 754952 |
163840 | 465668 | 2.50739e+06 | 1.52131e+06 |
327680 | 988357 | 5.41734e+06 | 3.12435e+06 |
655360 | 2.23863e+06 | 1.1108e+07 | 6.1396e+06 |
Summary
If you are looking for a std::vector
-like structure with ability to insert/erase at both ends, or need a double-ended queue structure with contiguous storage, I think a double-ended vector is a good alternative.
I think a nice use case is if you need to interact with low-level api’s which takes contiguous ranges, and you don’t want to take the cost of copying the elements to a std::vector
before you call the api.
Related Material
Update: When I wrote this post, I was sure someone had already made similar things, as the basic idea is pretty simple. Here are a couple examples (Please point me to others):