Fast incremental sort 2, priority queues and small scale

In my last post, I talked about how to incrementally sort values with a modification of the incremental quicksort algorithm. I have gotten a lot of great feedback on that post, and I wanted to share the results of some additional experiements that readers suggested.

Priority Queue

A lot of people suggested using a priority queue to solve the problem, and that seems like a natural thing to do. The approach is naturally incremental, each item popped from the queue is the next element in the sorted sequence. Using a standard binary heap, we only need $O(N)$ time to set up the queue, which is optimal.

The problem with priority queues, however, is that the time taken to remove each element from the queue is typically $O(\log N)$ , not $O(\log k)$ as is the optimal for incremental sorting (I’m assuming a simple binary heap implementation or similar). Thus we are spending too much time maintaining the heap property of the large range. In fact, the original paper on incremental quicksort showed how the algorithm sped up Kruskal’s algorithm compared to a version using a priority queue.

All those reasons aside, it’s always safest to measure. To implement a heap sorter in C++, I used the standard heap algorithms. The standard C++ heap algorithms assume a max heap rooted at the beginning of the range, which wouldn’t really work. Thankfully, by passing std::reverse_iterators we get the root of the heap at the end, and by using std::greater as the comparator, we get a min heap. Since this min heap will spit out the smallest element at the beginning of its range, we can call it each time we want a new item in the sequence.

template<typename I>
class heap_sorter {
public:
  heap_sorter(I i1, I i2) : first(i1), last(i2) {}

  class iterator {
  public:
    ...
    iterator(I first, I last)
	  :
	  current(std::make_reverse_iterator(first)),
	  end(std::make_reverse_iterator(last))
	{
      std::make_heap(end, current, std::greater<>());
      extract_current();
    }
    bool operator==(const iterator& o) const { return current == o.current; }
    bool operator!=(const iterator& o) const { return current != o.current; }
    iterator& operator++() {
      --current;
      extract_current();
      return *this;
    }

    value_type& operator*() {
      return *(current - 1);
    }

  private:
    void extract_current() {
      if (current != end) {
        std::pop_heap(end, current, std::greater<>());
      }
    }
    std::reverse_iterator<I> current;
    std::reverse_iterator<I> end;
  };

  iterator begin() {
    return iterator(first, last);
  }

  iterator end() {
    return iterator(last, last);
  }

private:
  I first;
  I last;
};

As we can see, the heap sorter does solve the problem, and is faster than doing a full sort for small and medium $k$ , but it’s not really a competitor against the incremental quicksort based sorts. If I have missed some simple way to optimize the heap sort, please let me know.

Heap Sorting (ms)

k	Pre-Sorter	Partial Sorter	Simple Quick Sorter	Skewed Quick Sorter	Heap Sorter
8	1083.18	7.8649	53.6251	6.19743	120.604
16	1083.18	7.86566	53.6259	6.19819	120.608
32	1083.18	7.86566	53.6263	6.19819	120.616
64	1083.18	7.86566	53.627	6.19857	120.628
128	1083.18	15.5341	53.6305	6.20123	120.656
256	1083.18	15.5348	53.6377	6.20617	120.709
512	1083.18	23.1146	53.646	6.21415	120.81
1024	1083.18	31.8237	53.6704	6.23734	121.003
2048	1083.18	41.0193	53.7103	6.3947	121.375
4096	1083.18	52.1542	54.2732	6.48554	122.072
8192	1083.19	68.4948	54.7213	6.84245	123.357
16384	1083.19	93.337	55.2367	7.52472	125.875
32768	1083.22	135.449	56.2064	8.93866	131.156
65536	1083.24	202.845	57.8833	11.1668	143.398
131072	1083.28	308.542	65.7102	24.4606	163.942
262144	1083.36	496.541	74.766	34.4201	205.644
524288	1083.48	832.594	92.7102	53.3226	285.62
1048576	1083.71	1459.07	142.352	110.043	448.69
2097152	1084	2387.17	219.68	216.516	777.519
4194304	1084.61	3569.34	428.108	421.007	1457.27
8388608	1085.75	4102.78	796.165	792.765	2627.77
10000000	1086.14	4105.16	919.948	927.276	2874.77

Smaller Scale

I was also asked why I had chosen the rather arbitrary scale of $N=10^7$ . I didn’t have a really good answer, so I thought I would try out for some smaller values of $N$ , to see if anything interesting happens.

N=1,000,000 (ms)

k	Pre-Sorter	Partial Sorter	Simple Quick Sorter	Skewed Quick Sorter	Heap Sorter
8	100.112	1.07757	4.85798	0.977982	12.6731
16	100.113	1.07833	4.85836	0.978362	12.675
32	100.113	1.07833	4.85836	0.978742	12.6776
64	100.113	1.07871	4.85874	0.979122	12.6826
128	100.113	2.34746	4.8633	0.982543	12.6925
256	100.113	2.34822	4.8652	0.988244	12.7126
512	100.113	3.81234	4.88458	1.01865	12.7529
1024	100.113	5.61893	4.91081	1.11329	12.832
2048	100.114	7.73529	5.04384	1.18209	12.9912
4096	100.114	10.957	5.14305	1.36264	13.3132
8192	100.116	17.5923	5.37681	1.69636	13.9479
16384	100.119	28.9175	5.84546	3.49002	15.2083
32768	100.128	47.1316	7.26093	4.60788	17.715
65536	100.142	71.585	11.4572	9.61257	24.2485
131072	100.171	115.55	18.7428	17.3494	34.7828
262144	100.23	170.876	30.4098	31.349	56.4344
524288	100.329	233.281	60.184	53.2618	96.5959
1000000	100.503	249.507	104.635	89.1293	160.038

N=100,000 (ms)

k	Pre-Sorter	Partial Sorter	Simple Quick Sorter	Skewed Quick Sorter	Heap Sorter
8	9.49702	0.197269	1.20338	0.130372	1.48769
16	9.49778	0.197649	1.20376	0.130752	1.48921
32	9.49778	0.197649	1.20414	0.131132	1.49225
64	9.49778	0.197649	1.20452	0.131512	1.49757
128	9.49816	0.44319	1.20718	0.137594	1.50821
256	9.49816	0.44357	1.21516	0.141015	1.52836
512	9.49816	0.834306	1.24101	0.158879	1.56979
1024	9.49854	1.50517	1.27407	0.192327	1.65189
2048	9.49854	2.66788	1.35313	0.250862	1.82749
4096	9.4993	4.58963	1.49339	0.542014	2.16805
8192	9.50006	7.60758	1.791	0.888279	2.83512
16384	9.50196	11.9475	2.49493	1.83433	4.11223
32768	9.50842	17.4326	3.71276	3.85453	6.5703
65536	9.51717	21.8413	6.40002	7.04808	11.2747
100000	9.52819	21.8949	9.20131	10.0223	15.5835

N=10,000 (ms)

k	Pre-Sorter	Partial Sorter	Simple Quick Sorter	Skewed Quick Sorter	Heap Sorter
8	0.657563	0.055874	0.092743	0.025846	0.146716
16	0.657563	0.056254	0.093123	0.026607	0.148236
32	0.657563	0.056254	0.093123	0.030408	0.150137
64	0.657943	0.056634	0.095024	0.030788	0.154318
128	0.657943	0.141015	0.098825	0.031548	0.16192
256	0.657943	0.141015	0.109087	0.056634	0.177504
512	0.657943	0.307496	0.124671	0.068797	0.209051
1024	0.658323	0.5671	0.147857	0.12125	0.270627
2048	0.658323	0.960117	0.264166	0.190427	0.387315
4096	0.658703	1.40331	0.408221	0.345885	0.613091
8192	0.659083	1.67241	0.68759	0.652621	1.03424
10000	0.659463	1.67545	0.789075	0.772351	1.18855

N=1,000 (ms)

k	Pre-Sorter	Partial Sorter	Simple Quick Sorter	Skewed Quick Sorter	Heap Sorter
8	0.061575	0.034969	0.021666	0.026226	0.015203
16	0.061575	0.035349	0.022046	0.026226	0.016344
32	0.061955	0.035349	0.022046	0.026606	0.017864
64	0.061955	0.035349	0.022426	0.026986	0.020905
128	0.061955	0.070698	0.043711	0.030407	0.026986
256	0.062335	0.071078	0.046752	0.04371	0.038389
512	0.062335	0.113648	0.060815	0.061195	0.058914
1000	0.062335	0.131893	0.082861	0.08362	0.090462

N=100 (ms)

k	Pre-Sorter	Partial Sorter	Simple Quick Sorter	Skewed Quick Sorter	Heap Sorter
8	0.005701	0.020525	0.012543	0.014064	0.00266
16	0.006081	0.020905	0.012923	0.014064	0.003421
32	0.006081	0.020905	0.012923	0.014444	0.004561
64	0.006081	0.021285	0.013303	0.014824	0.006461
100	0.006461	0.021285	0.013683	0.015204	0.007602

I really don’t know if these microsesecond level measurements are reliable, as they were not what I intended to measure when I wrote the original timer code. Anyway, I think we can see a pattern. My custom sorters have a certain startup cost, that starts making a difference at $N\le 1000$ . The heap based sorter seems to an excellent choice for that case. For $N\ge 10^4$ , skewed incremental quicksort seems like the best option.

An interesting next step is to if it’s possible to make a sorter that behaves like the heap sorter for small $N$ , and like the skewed incremental quicksort for large $N$ . I would guess the main challenge is to avoid the overhead over switching between two possible implementations.