Fast incremental sort 2, priority queues and small scale

25 Apr 2016

In my last post, I talked about how to incrementally sort values with a modification of the incremental quicksort algorithm. I have gotten a lot of great feedback on that post, and I wanted to share the results of some additional experiements that readers suggested.

Priority Queue

A lot of people suggested using a priority queue to solve the problem, and that seems like a natural thing to do. The approach is naturally incremental, each item popped from the queue is the next element in the sorted sequence. Using a standard binary heap, we only need time to set up the queue, which is optimal.

The problem with priority queues, however, is that the time taken to remove each element from the queue is typically , not as is the optimal for incremental sorting (I’m assuming a simple binary heap implementation or similar). Thus we are spending too much time maintaining the heap property of the large range. In fact, the original paper on incremental quicksort showed how the algorithm sped up Kruskal’s algorithm compared to a version using a priority queue.

All those reasons aside, it’s always safest to measure. To implement a heap sorter in C++, I used the standard heap algorithms. The standard C++ heap algorithms assume a max heap rooted at the beginning of the range, which wouldn’t really work. Thankfully, by passing std::reverse_iterators we get the root of the heap at the end, and by using std::greater as the comparator, we get a min heap. Since this min heap will spit out the smallest element at the beginning of its range, we can call it each time we want a new item in the sequence.

template<typename I>
class heap_sorter {
public:
  heap_sorter(I i1, I i2) : first(i1), last(i2) {}

  class iterator {
  public:
    ...
    iterator(I first, I last)
	  :
	  current(std::make_reverse_iterator(first)),
	  end(std::make_reverse_iterator(last))
	{
      std::make_heap(end, current, std::greater<>());
      extract_current();
    }
    bool operator==(const iterator& o) const { return current == o.current; }
    bool operator!=(const iterator& o) const { return current != o.current; }
    iterator& operator++() {
      --current;
      extract_current();
      return *this;
    }

    value_type& operator*() {
      return *(current - 1);
    }

  private:
    void extract_current() {
      if (current != end) {
        std::pop_heap(end, current, std::greater<>());
      }
    }
    std::reverse_iterator<I> current;
    std::reverse_iterator<I> end;
  };

  iterator begin() {
    return iterator(first, last);
  }

  iterator end() {
    return iterator(last, last);
  }

private:
  I first;
  I last;
};

As we can see, the heap sorter does solve the problem, and is faster than doing a full sort for small and medium , but it’s not really a competitor against the incremental quicksort based sorts. If I have missed some simple way to optimize the heap sort, please let me know.

Heap Sorting (ms)

k Pre-Sorter Partial Sorter Simple Quick Sorter Skewed Quick Sorter Heap Sorter
8 1083.18 7.8649 53.6251 6.19743 120.604
16 1083.18 7.86566 53.6259 6.19819 120.608
32 1083.18 7.86566 53.6263 6.19819 120.616
64 1083.18 7.86566 53.627 6.19857 120.628
128 1083.18 15.5341 53.6305 6.20123 120.656
256 1083.18 15.5348 53.6377 6.20617 120.709
512 1083.18 23.1146 53.646 6.21415 120.81
1024 1083.18 31.8237 53.6704 6.23734 121.003
2048 1083.18 41.0193 53.7103 6.3947 121.375
4096 1083.18 52.1542 54.2732 6.48554 122.072
8192 1083.19 68.4948 54.7213 6.84245 123.357
16384 1083.19 93.337 55.2367 7.52472 125.875
32768 1083.22 135.449 56.2064 8.93866 131.156
65536 1083.24 202.845 57.8833 11.1668 143.398
131072 1083.28 308.542 65.7102 24.4606 163.942
262144 1083.36 496.541 74.766 34.4201 205.644
524288 1083.48 832.594 92.7102 53.3226 285.62
1048576 1083.71 1459.07 142.352 110.043 448.69
2097152 1084 2387.17 219.68 216.516 777.519
4194304 1084.61 3569.34 428.108 421.007 1457.27
8388608 1085.75 4102.78 796.165 792.765 2627.77
10000000 1086.14 4105.16 919.948 927.276 2874.77

Smaller Scale

I was also asked why I had chosen the rather arbitrary scale of . I didn’t have a really good answer, so I thought I would try out for some smaller values of , to see if anything interesting happens.

N=1,000,000 (ms)

k Pre-Sorter Partial Sorter Simple Quick Sorter Skewed Quick Sorter Heap Sorter
8 100.112 1.07757 4.85798 0.977982 12.6731
16 100.113 1.07833 4.85836 0.978362 12.675
32 100.113 1.07833 4.85836 0.978742 12.6776
64 100.113 1.07871 4.85874 0.979122 12.6826
128 100.113 2.34746 4.8633 0.982543 12.6925
256 100.113 2.34822 4.8652 0.988244 12.7126
512 100.113 3.81234 4.88458 1.01865 12.7529
1024 100.113 5.61893 4.91081 1.11329 12.832
2048 100.114 7.73529 5.04384 1.18209 12.9912
4096 100.114 10.957 5.14305 1.36264 13.3132
8192 100.116 17.5923 5.37681 1.69636 13.9479
16384 100.119 28.9175 5.84546 3.49002 15.2083
32768 100.128 47.1316 7.26093 4.60788 17.715
65536 100.142 71.585 11.4572 9.61257 24.2485
131072 100.171 115.55 18.7428 17.3494 34.7828
262144 100.23 170.876 30.4098 31.349 56.4344
524288 100.329 233.281 60.184 53.2618 96.5959
1000000 100.503 249.507 104.635 89.1293 160.038

N=100,000 (ms)

k Pre-Sorter Partial Sorter Simple Quick Sorter Skewed Quick Sorter Heap Sorter
8 9.49702 0.197269 1.20338 0.130372 1.48769
16 9.49778 0.197649 1.20376 0.130752 1.48921
32 9.49778 0.197649 1.20414 0.131132 1.49225
64 9.49778 0.197649 1.20452 0.131512 1.49757
128 9.49816 0.44319 1.20718 0.137594 1.50821
256 9.49816 0.44357 1.21516 0.141015 1.52836
512 9.49816 0.834306 1.24101 0.158879 1.56979
1024 9.49854 1.50517 1.27407 0.192327 1.65189
2048 9.49854 2.66788 1.35313 0.250862 1.82749
4096 9.4993 4.58963 1.49339 0.542014 2.16805
8192 9.50006 7.60758 1.791 0.888279 2.83512
16384 9.50196 11.9475 2.49493 1.83433 4.11223
32768 9.50842 17.4326 3.71276 3.85453 6.5703
65536 9.51717 21.8413 6.40002 7.04808 11.2747
100000 9.52819 21.8949 9.20131 10.0223 15.5835

N=10,000 (ms)

k Pre-Sorter Partial Sorter Simple Quick Sorter Skewed Quick Sorter Heap Sorter
8 0.657563 0.055874 0.092743 0.025846 0.146716
16 0.657563 0.056254 0.093123 0.026607 0.148236
32 0.657563 0.056254 0.093123 0.030408 0.150137
64 0.657943 0.056634 0.095024 0.030788 0.154318
128 0.657943 0.141015 0.098825 0.031548 0.16192
256 0.657943 0.141015 0.109087 0.056634 0.177504
512 0.657943 0.307496 0.124671 0.068797 0.209051
1024 0.658323 0.5671 0.147857 0.12125 0.270627
2048 0.658323 0.960117 0.264166 0.190427 0.387315
4096 0.658703 1.40331 0.408221 0.345885 0.613091
8192 0.659083 1.67241 0.68759 0.652621 1.03424
10000 0.659463 1.67545 0.789075 0.772351 1.18855

N=1,000 (ms)

k Pre-Sorter Partial Sorter Simple Quick Sorter Skewed Quick Sorter Heap Sorter
8 0.061575 0.034969 0.021666 0.026226 0.015203
16 0.061575 0.035349 0.022046 0.026226 0.016344
32 0.061955 0.035349 0.022046 0.026606 0.017864
64 0.061955 0.035349 0.022426 0.026986 0.020905
128 0.061955 0.070698 0.043711 0.030407 0.026986
256 0.062335 0.071078 0.046752 0.04371 0.038389
512 0.062335 0.113648 0.060815 0.061195 0.058914
1000 0.062335 0.131893 0.082861 0.08362 0.090462

N=100 (ms)

k Pre-Sorter Partial Sorter Simple Quick Sorter Skewed Quick Sorter Heap Sorter
8 0.005701 0.020525 0.012543 0.014064 0.00266
16 0.006081 0.020905 0.012923 0.014064 0.003421
32 0.006081 0.020905 0.012923 0.014444 0.004561
64 0.006081 0.021285 0.013303 0.014824 0.006461
100 0.006461 0.021285 0.013683 0.015204 0.007602

I really don’t know if these microsesecond level measurements are reliable, as they were not what I intended to measure when I wrote the original timer code. Anyway, I think we can see a pattern. My custom sorters have a certain startup cost, that starts making a difference at . The heap based sorter seems to an excellent choice for that case. For , skewed incremental quicksort seems like the best option.

An interesting next step is to if it’s possible to make a sorter that behaves like the heap sorter for small , and like the skewed incremental quicksort for large . I would guess the main challenge is to avoid the overhead over switching between two possible implementations.

Vote on Hacker News Tweet
comments powered by Disqus