In my last post, I talked about how to incrementally sort values with a modification of the incremental quicksort algorithm. I have gotten a lot of great feedback on that post, and I wanted to share the results of some additional experiements that readers suggested.
Priority Queue
A lot of people suggested using a priority queue to solve the problem, and that seems like a natural thing to do. The approach is naturally incremental, each item popped from the queue is the next element in the sorted sequence. Using a standard binary heap, we only need time to set up the queue, which is optimal.
The problem with priority queues, however, is that the time taken to remove each element from the queue is typically , not as is the optimal for incremental sorting (I’m assuming a simple binary heap implementation or similar). Thus we are spending too much time maintaining the heap property of the large range. In fact, the original paper on incremental quicksort showed how the algorithm sped up Kruskal’s algorithm compared to a version using a priority queue.
All those reasons aside, it’s always safest to measure. To implement a heap sorter in C++, I used the standard heap algorithms. The standard C++ heap algorithms assume a max heap rooted at the beginning of the range, which wouldn’t really work. Thankfully, by passing std::reverse_iterator
s we get the root of the heap at the end, and by using std::greater
as the comparator, we get a min heap. Since this min heap will spit out the smallest element at the beginning of its range, we can call it each time we want a new item in the sequence.
As we can see, the heap sorter does solve the problem, and is faster than doing a full sort for small and medium , but it’s not really a competitor against the incremental quicksort based sorts. If I have missed some simple way to optimize the heap sort, please let me know.
Heap Sorting (ms)
k | Pre-Sorter | Partial Sorter | Simple Quick Sorter | Skewed Quick Sorter | Heap Sorter |
---|---|---|---|---|---|
8 | 1083.18 | 7.8649 | 53.6251 | 6.19743 | 120.604 |
16 | 1083.18 | 7.86566 | 53.6259 | 6.19819 | 120.608 |
32 | 1083.18 | 7.86566 | 53.6263 | 6.19819 | 120.616 |
64 | 1083.18 | 7.86566 | 53.627 | 6.19857 | 120.628 |
128 | 1083.18 | 15.5341 | 53.6305 | 6.20123 | 120.656 |
256 | 1083.18 | 15.5348 | 53.6377 | 6.20617 | 120.709 |
512 | 1083.18 | 23.1146 | 53.646 | 6.21415 | 120.81 |
1024 | 1083.18 | 31.8237 | 53.6704 | 6.23734 | 121.003 |
2048 | 1083.18 | 41.0193 | 53.7103 | 6.3947 | 121.375 |
4096 | 1083.18 | 52.1542 | 54.2732 | 6.48554 | 122.072 |
8192 | 1083.19 | 68.4948 | 54.7213 | 6.84245 | 123.357 |
16384 | 1083.19 | 93.337 | 55.2367 | 7.52472 | 125.875 |
32768 | 1083.22 | 135.449 | 56.2064 | 8.93866 | 131.156 |
65536 | 1083.24 | 202.845 | 57.8833 | 11.1668 | 143.398 |
131072 | 1083.28 | 308.542 | 65.7102 | 24.4606 | 163.942 |
262144 | 1083.36 | 496.541 | 74.766 | 34.4201 | 205.644 |
524288 | 1083.48 | 832.594 | 92.7102 | 53.3226 | 285.62 |
1048576 | 1083.71 | 1459.07 | 142.352 | 110.043 | 448.69 |
2097152 | 1084 | 2387.17 | 219.68 | 216.516 | 777.519 |
4194304 | 1084.61 | 3569.34 | 428.108 | 421.007 | 1457.27 |
8388608 | 1085.75 | 4102.78 | 796.165 | 792.765 | 2627.77 |
10000000 | 1086.14 | 4105.16 | 919.948 | 927.276 | 2874.77 |
Smaller Scale
I was also asked why I had chosen the rather arbitrary scale of . I didn’t have a really good answer, so I thought I would try out for some smaller values of , to see if anything interesting happens.
N=1,000,000 (ms)
k | Pre-Sorter | Partial Sorter | Simple Quick Sorter | Skewed Quick Sorter | Heap Sorter |
---|---|---|---|---|---|
8 | 100.112 | 1.07757 | 4.85798 | 0.977982 | 12.6731 |
16 | 100.113 | 1.07833 | 4.85836 | 0.978362 | 12.675 |
32 | 100.113 | 1.07833 | 4.85836 | 0.978742 | 12.6776 |
64 | 100.113 | 1.07871 | 4.85874 | 0.979122 | 12.6826 |
128 | 100.113 | 2.34746 | 4.8633 | 0.982543 | 12.6925 |
256 | 100.113 | 2.34822 | 4.8652 | 0.988244 | 12.7126 |
512 | 100.113 | 3.81234 | 4.88458 | 1.01865 | 12.7529 |
1024 | 100.113 | 5.61893 | 4.91081 | 1.11329 | 12.832 |
2048 | 100.114 | 7.73529 | 5.04384 | 1.18209 | 12.9912 |
4096 | 100.114 | 10.957 | 5.14305 | 1.36264 | 13.3132 |
8192 | 100.116 | 17.5923 | 5.37681 | 1.69636 | 13.9479 |
16384 | 100.119 | 28.9175 | 5.84546 | 3.49002 | 15.2083 |
32768 | 100.128 | 47.1316 | 7.26093 | 4.60788 | 17.715 |
65536 | 100.142 | 71.585 | 11.4572 | 9.61257 | 24.2485 |
131072 | 100.171 | 115.55 | 18.7428 | 17.3494 | 34.7828 |
262144 | 100.23 | 170.876 | 30.4098 | 31.349 | 56.4344 |
524288 | 100.329 | 233.281 | 60.184 | 53.2618 | 96.5959 |
1000000 | 100.503 | 249.507 | 104.635 | 89.1293 | 160.038 |
N=100,000 (ms)
k | Pre-Sorter | Partial Sorter | Simple Quick Sorter | Skewed Quick Sorter | Heap Sorter |
---|---|---|---|---|---|
8 | 9.49702 | 0.197269 | 1.20338 | 0.130372 | 1.48769 |
16 | 9.49778 | 0.197649 | 1.20376 | 0.130752 | 1.48921 |
32 | 9.49778 | 0.197649 | 1.20414 | 0.131132 | 1.49225 |
64 | 9.49778 | 0.197649 | 1.20452 | 0.131512 | 1.49757 |
128 | 9.49816 | 0.44319 | 1.20718 | 0.137594 | 1.50821 |
256 | 9.49816 | 0.44357 | 1.21516 | 0.141015 | 1.52836 |
512 | 9.49816 | 0.834306 | 1.24101 | 0.158879 | 1.56979 |
1024 | 9.49854 | 1.50517 | 1.27407 | 0.192327 | 1.65189 |
2048 | 9.49854 | 2.66788 | 1.35313 | 0.250862 | 1.82749 |
4096 | 9.4993 | 4.58963 | 1.49339 | 0.542014 | 2.16805 |
8192 | 9.50006 | 7.60758 | 1.791 | 0.888279 | 2.83512 |
16384 | 9.50196 | 11.9475 | 2.49493 | 1.83433 | 4.11223 |
32768 | 9.50842 | 17.4326 | 3.71276 | 3.85453 | 6.5703 |
65536 | 9.51717 | 21.8413 | 6.40002 | 7.04808 | 11.2747 |
100000 | 9.52819 | 21.8949 | 9.20131 | 10.0223 | 15.5835 |
N=10,000 (ms)
k | Pre-Sorter | Partial Sorter | Simple Quick Sorter | Skewed Quick Sorter | Heap Sorter |
---|---|---|---|---|---|
8 | 0.657563 | 0.055874 | 0.092743 | 0.025846 | 0.146716 |
16 | 0.657563 | 0.056254 | 0.093123 | 0.026607 | 0.148236 |
32 | 0.657563 | 0.056254 | 0.093123 | 0.030408 | 0.150137 |
64 | 0.657943 | 0.056634 | 0.095024 | 0.030788 | 0.154318 |
128 | 0.657943 | 0.141015 | 0.098825 | 0.031548 | 0.16192 |
256 | 0.657943 | 0.141015 | 0.109087 | 0.056634 | 0.177504 |
512 | 0.657943 | 0.307496 | 0.124671 | 0.068797 | 0.209051 |
1024 | 0.658323 | 0.5671 | 0.147857 | 0.12125 | 0.270627 |
2048 | 0.658323 | 0.960117 | 0.264166 | 0.190427 | 0.387315 |
4096 | 0.658703 | 1.40331 | 0.408221 | 0.345885 | 0.613091 |
8192 | 0.659083 | 1.67241 | 0.68759 | 0.652621 | 1.03424 |
10000 | 0.659463 | 1.67545 | 0.789075 | 0.772351 | 1.18855 |
N=1,000 (ms)
k | Pre-Sorter | Partial Sorter | Simple Quick Sorter | Skewed Quick Sorter | Heap Sorter |
---|---|---|---|---|---|
8 | 0.061575 | 0.034969 | 0.021666 | 0.026226 | 0.015203 |
16 | 0.061575 | 0.035349 | 0.022046 | 0.026226 | 0.016344 |
32 | 0.061955 | 0.035349 | 0.022046 | 0.026606 | 0.017864 |
64 | 0.061955 | 0.035349 | 0.022426 | 0.026986 | 0.020905 |
128 | 0.061955 | 0.070698 | 0.043711 | 0.030407 | 0.026986 |
256 | 0.062335 | 0.071078 | 0.046752 | 0.04371 | 0.038389 |
512 | 0.062335 | 0.113648 | 0.060815 | 0.061195 | 0.058914 |
1000 | 0.062335 | 0.131893 | 0.082861 | 0.08362 | 0.090462 |
N=100 (ms)
k | Pre-Sorter | Partial Sorter | Simple Quick Sorter | Skewed Quick Sorter | Heap Sorter |
---|---|---|---|---|---|
8 | 0.005701 | 0.020525 | 0.012543 | 0.014064 | 0.00266 |
16 | 0.006081 | 0.020905 | 0.012923 | 0.014064 | 0.003421 |
32 | 0.006081 | 0.020905 | 0.012923 | 0.014444 | 0.004561 |
64 | 0.006081 | 0.021285 | 0.013303 | 0.014824 | 0.006461 |
100 | 0.006461 | 0.021285 | 0.013683 | 0.015204 | 0.007602 |
I really don’t know if these microsesecond level measurements are reliable, as they were not what I intended to measure when I wrote the original timer code. Anyway, I think we can see a pattern. My custom sorters have a certain startup cost, that starts making a difference at . The heap based sorter seems to an excellent choice for that case. For , skewed incremental quicksort seems like the best option.
An interesting next step is to if it’s possible to make a sorter that behaves like the heap sorter for small , and like the skewed incremental quicksort for large . I would guess the main challenge is to avoid the overhead over switching between two possible implementations.