Double-ended vector - is it useful?

I have frequently seen the question ‘How do I push an element to the front of a C++ vector?’. Usually the correct answer is to use deque instead, and that’s the end of it. But what if you also want contiguous storage? In this post I present a simple implementation of a double-ended vector, devector, and compare the performance to vector, deque, and a circular buffer.

Basic Concept

A std::vector is typically implemented as a memory buffer of a certain size, which is partially filled up with elements, beginning at the start of the buffer. The distance from the last element to the end of the buffer is the spare capacity of the vector, and determines how many elements we can insert before we need to reallocate. The fact that we only have spare capacity in the back end of the vector, means that we can only efficiently insert at the back (or close to it).

The most important change we need to make to allow for fast insertion at both ends, is to allow the devector to have spare capacity at both ends. This means we keep a pointer to the start of the elements, which can be different from the start of the underlying buffer. As long as there is spare capacity at both ends of the buffer, we can insert at both ends quickly.

Reaching the end

Inevitably, we will use up all spare capacity on one side of the buffer, and then how do we handle the next insertion? For std::vector this is trivial, when we reach the back of the buffer we have to reallocate, because there is never room at the front end. For devector, however, we have a few options when the other side of the buffer still has spare capacity:

Move elements one step toward the other side. This is equal to how front insertions are handled in std::vector, and is very slow. Using this strategy would mean that the devector would devolve to have $O(n)$ insertions at one end.
Move elements several steps towards the other side. This is more interesting, because of the nice asymptotic properties. The initial insert is still slow, but now the following inserts will be fast again. If we move the the elements such that the remaining spare capacity is equal on both sides, we only need to do this $O(\log n)$ times before the buffer is completely full. This is because each time we fill up on one side and move elements, the spare capacity is cut in half. Since we start with $O(n)$ spare capacity after the previous reallocation, it takes $O(\log n)$ times to get down to a constant amount of spare capacity.
Reallocating, and placing the elements in the middle of a new buffer. This is fast for a growing devector, because we don’t waste time moving elements back and forth in the same small buffer. It is bad for memory usage, though, because we reallocate to a larger buffer, even though we haven’t filled up the current buffer. It is also bad if we are pushing elements to one side, and popping them from the other, because then reaching the end doesn’t mean we need more space, it just means the elements should be moved.

The best option is a combination of 2 and 3. When there is still a lot of room left in the buffer, we should move elements toward the middle, and not reallocate straight away. However, as the buffer fills up, the gains from moving elements are diminishing, while the cost is increasing. Moving $O(n)$ elements $O(\log n)$ times takes $O(n \cdot \log n)$ work, and amortized on $O(N)$ inserts we get $O(\log n)$ time per element, which is unacceptable. A nice solution is to move elements if the size-to-capacity ratio is less than some limit $\alpha$ , and reallocate if the ratio is larger.

I will now show that this tactic leads to amortized constant time inserts. After the previous reallocation, the devector is $\frac{\alpha}{\beta}$ full, where $\beta$ is the growth factor used for reallocations. For example, if the limit is 90% and the growth factor is 2, the devector will be approximately 45% full after a reallocation. The spare capacity right after the reallocation is $1-\frac{\alpha}{\beta}$ (65% in the example). If every move operation cuts the spare capacity in two, it takes $\log_2 (1-\frac{\alpha}{\beta}) / (1 - \alpha)$ move operations before we reach the limit again and should do a new reallocation. Since $\alpha$ and $\beta$ are constants, the number of move operations is constant. Moving $O(n)$ elements $O(1)$ times takes $O(n)$ work, and since it is amortized over $\Theta(n)$ inserts, we get $O(1)$ amortized time per insert.

Adapting to the input

It’s great to have $O(1)$ insertion time at both ends, but the constant factors can still be high, as we must move all elements several times between each reallocation. The problem is that no matter how the devector is used, we treat insertions at both ends as equally likely, so we try to keep equal amounts of spare capacity on both ends. What if we assumed that previous behaviour predicts future behaviour?. That’s often a reasonably assumption. If we keep track of how many insertions are done at both ends, we can give more spare capacity to the active end, thus cutting down on (or totally getting rid of) the move operations. As long as we always give $\Theta(n)$ spare capacity to both ends, the analysis above still holds.

For example, if we have reached the back end of the buffer after a series of push_backs, and the buffer is 70% percent full, an unadaptive tactic would give 15% spare capacity to both sides. We can now push_back 15% more elements until we reach the limit again. The buffer is now 85% full, and we need to move the elements again (assuming the reallocation limit is 90%).

If we use an adaptive tactic, we could give the front 5% spare capacity, and give 25% to the back. Then we can push_back 25% more before reaching the end. Now the buffer is 95% full, and we go straight to reallocation, without an additional move operation in between.

Code

The code for devector as well as the performance measurements is available at github/cppwhiletrue. There are probably still bugs in the code, but I think it’s correct enough for the measurements to be correct. I apologize for the non-portable time measurement code, but it was the simplest way to get high resolution timing with g++/mingw.

Results

All results reported are the median of 51 runs. The performance tests are run on an Intel Core i7-4600U @2.1GHz. I report results both from MSVC compiler and g++/mingw. The circular data structure used in the tests are a slightly adapted boost::circular_buffer_space_optimized.

Push back

In this test, I measured the time taken to push_back $N$ random integers into each of the structures. The performance of devector looks good for back insertion, only beaten by the libstdc++ version of std::deque. The difference between libstdc++ and msvc std::deque is likely due to the small node size used by msvc.

integer push back - g++/mingw (ns)

N	vector	devector	deque	circular
10	354.891	270.996	67.1919	319.998
20	420.227	349.695	98.1895	784.031
40	550.898	500.414	141.809	605.844
80	678.602	593.961	160.369	893.914
160	950.336	795.906	347.469	1395.81
320	1413.62	1078.04	501.898	2292.7
640	2381.8	1900.66	1173.08	4074.59
1280	4038.94	2886.66	2061.05	7460.19
2560	7555.19	5250.62	4110.22	14207.6
5120	14350.1	13566.1	8790.62	27750
10240	28985.2	22903.2	16821	54359.5
20480	55309.5	32786.8	32311.5	107578
40960	110619	64813	62532.5	249750
81920	222000	133428	125825	429935
163840	561082	258113	243287	911189
327680	1.3358e+006	710096	498740	2.05426e+006
655360	3.02285e+006	1.99458e+006	1.11266e+006	4.65858e+006

integer push back - msvc (ns)

N	vector	devector	deque	circular
10	634.055	349.695	404.637	452.895
20	813.727	421.715	540.508	586.539
40	977.07	549.414	953.313	813.73
80	1306.72	787	1787.83	1137.44
160	1823.47	1098.83	3397.47	1758.13
320	2696.59	1496.78	5785.19	2619.38
640	3254.91	2827.25	11214	4597.28
1280	5725.81	4288.41	21905.4	9289.56
2560	10667.6	7578.94	44286	16250.8
5120	17201	14302.6	90092	31931.5
10240	33737	27940	174483	60631.5
20480	69945	54739.5	334901	119743
40960	145972	107959	707435	236825
81920	251650	236445	1.39282e+06	472510
163840	537894	471750	2.83849e+06	1.07123e+06
327680	1.90829e+06	1.23963e+06	5.72069e+06	2.53476e+06
655360	3.72991e+06	3.06619e+06	1.22613e+07	5.19382e+06

Push front

In this test, I measured the time taken to push_front $N$ random integers into each of the structures. The performance of devector also looks good for front insertion, again beaten by the libstdc++ version of std::deque.

integer push front - g++/mingw (ns)

N	devector	deque	circular
10	417.258	109.512	348.953
20	537.531	102.457	415.773
40	687.512	141.066	576.145
80	662.266	276.564	882.031
160	920.641	351.922	1401.75
320	1265.14	589.508	2304.56
640	2001.66	1134.47	4145.84
1280	2981.69	2132.33	7483.94
2560	5630.75	4419.06	14255.1
5120	13970	10501.2	27750
10240	17676.2	20194.8	63482.5
20480	33071	34402.2	108339
40960	90092	67284	215537
81920	130387	129626	429935
163840	280161	269897	970870
327680	829079	572867	2.21468e+006
655360	2.20442e+006	1.23126e+006	4.86728e+006

integer push front - msvc (ns)

N	devector	deque	circular
10	472.199	345.24	403.895
20	543.477	513.773	574.66
40	721.664	837.484	751.363
80	885	1633.39	1155.26
160	1223.56	3433.09	1645.28
320	1651.22	5904	2815.39
640	2922.28	12592	4846.75
1280	4822.97	23948.6	7982.88
2560	8363	48657.5	16678.5
5120	15965.5	97315	32026.5
10240	28700	187407	62722.5
20480	59681	349726	112140
40960	109099	716939	221239
81920	231883	1.35633e+06	484675
163840	427274	2.76398e+06	1.13243e+06
327680	1.43768e+06	6.00009e+06	2.37434e+06
655360	3.08025e+06	1.26753e+07	5.2037e+06

Push back / Pop front

In this test, I pushed $N$ elements to each structure, and then measured the time taken to pop_front and push_back $N$ random integers. This test emulates the steady state operation of a queue.

integer push/pop - g++/mingw (ns)

N	devector	deque	circular
10	118.791	69.7908	48.6299
20	246.494	78.3286	106.171
40	201.205	118.05	203.432
80	295.496	374.195	394.984
160	662.266	687.512	1404.72
320	974.094	1603.7	1419.56
640	1728.43	3017.31	4965.5
1280	3243.03	3777.59	5630.75
2560	6319.75	7507.69	11261.5
5120	11451.6	20622.4	42195
10240	27179.8	44571	86291
20480	53979.5	90092.5	89712
40960	96174	115941	180565
81920	261534	233784	363031
163840	496079	481253	833261
327680	1.06666e+006	991018	1.63877e+006
655360	2.4268e+006	2.04628e+006	3.63982e+006

integer push/pop - msvc (ns)

N	devector	deque	circular
10	112.853	194.151	98.375
20	132.899	333.361	205.66
40	206.402	590.992	392.758
80	296.238	1155.26	714.242
160	507.836	2506.52	1380.96
320	1042.41	4977.41	2732.22
640	1657.16	9099.5	5452.59
1280	3290.56	19719.6	10881.4
2560	6581.13	39629.3	25089
5120	13209.8	72986	46756.8
10240	27464.8	158137	89712
20480	54929.5	304490	191209
40960	109479	612401	611641
81920	246709	1.27916e+06	754952
163840	465668	2.50739e+06	1.52131e+06
327680	988357	5.41734e+06	3.12435e+06
655360	2.23863e+06	1.1108e+07	6.1396e+06

Summary

If you are looking for a std::vector-like structure with ability to insert/erase at both ends, or need a double-ended queue structure with contiguous storage, I think a double-ended vector is a good alternative.

I think a nice use case is if you need to interact with low-level api’s which takes contiguous ranges, and you don’t want to take the cost of copying the elements to a std::vector before you call the api.

Related Material

Update: When I wrote this post, I was sure someone had already made similar things, as the basic idea is pretty simple. Here are a couple examples (Please point me to others):

QList, used by Qt, uses the same basic idea.
Orson Peters implemented a very similar structure, also called devector, which can be found here.