Sorting in .NET

Sorting is a typical task each programmer should be aware of. That’s why this article is dedicated to the implementation of sorting in .NET. I will describe how array sorting works in .NET, its aspects, and make a small comparison with sorting in Java.

Let’s begin with the fact that the first versions of .NET use the quicksort algorithm by default. So, let’s consider pros and cons of the quicksort.

Pros

  1. One of the most high-performance algorithms(on a practical level) of general-purpose internal sorting.
  2. Easy implementation.
  3. Requires just O(logn) of additional memory for its operation.
  4. Can be easily combined with cashing and internal memory mechanisms.


Cons

  1. Heavily decreases in speed down to O(n2) in the case of unsuccessful pivot selections. This may happen in the case of unsuccessful input data. To prevent such situations, we need to select a pivot accidentally, not in a fixed way.
  2. Lame implementation of the algorithm may result in stack overflow error, since it may require O(n) embedded recursive calls. In improved implementations, where recursive call takes place only for sorting of the smaller part of an array, the recursion depth does not exceed O(logn).
  3. It is unstable. If stability is required, a key must be expanded.

The lame implementation may look in the following way:

The above algorithm has the following cons:

  1. Since pivot is selected as the middle of an array, it may be the case that it always will be the maximum. As a result, an array will be divided into two parts of n — 1 and 1 in length, and the algorithm speed will decrease to O(n2).
  2. Under the specified conditions, the recursion depth reaches O(n). As a result, software stack overflow may take place if n values are large.
  3. The algorithm is unstable. That is, it modifies elements with similar values. If we sort numbers, it does not have any effect. But if we sort an array of objects by some property, it has vital importance, as several calls of the Sort method may result in an array with elements of different order.

Sorting Implementation in .NET

.NET 1.0

Let’s see what happens in .NET 1.0. Leaping ahead, I can say that we won’t see anything positive here, especially for user-defined value types (in particular, because of the lack of generalizations).

Now, let’s take a look at the SorterObjectArray and SorterGenericArray classes:

SorterObjectArray

SorterGenericArray

So, what does actually happen here? The following code:

is nothing else but an attempt to use array covariance that works only for referenced types. So, the SorterObjectArray class is used for referenced types, while the SorterGenericArray class is used for value types. But what is the difference between the classes? The only difference is the method of accessing array elements. The slow GetValue and SetValue methods are used for value types. So, will an array of integers be sorted very long (since integer is a value type)? No! It is sorted very fast because of the following code:

The Array.TrySZSort method is interesting, in particular. This method calls the native sorting implemented on С++ in CLR itself. It works for primitive types when we use the standard logic of element comparison. That is, when comparer == Comparer.Default || comparer == null.

The native implementation looks in the following way:

Native TrySZSort

Native QuickSort

As you see, the native sorting works only for primitive types, including all numeric types+logical type+character type. As for the user-defined types, everything will work extremely slow.

Let’s consider the implementation of the sorting algorithm as such. We will consider its implementation in the SorterObjectArray class since the native implementation and the implementation for value types are similar.

1. An array middle is always taken as a pivot:

When input data is bad, the algorithm execution time may become quadratic. Besides, the middle is taken on the basis of the formula num1 + num2 >> 1 that may result in the int type overflow. The same error has been made in binary search and sorting algorithm in Java (link to the bug).

This problem has not been fixed in the next versions of Java.

2. To avoid stack overflow, this implementation has an optimization that eliminates one recursion branch. Instead of calling recursive division procedure for both found subarrays after array division, the recursive call is made only for smaller subarray, while the bigger one is processed in a loop within the same procedure call. In terms of efficiency, there is no difference in the average case: the overhead cost of additional recursive call and length comparison of subarrays and the loop are approximately the same. Meanwhile, the recursion depth never exceeds log2n, and in the worst case of confluent division, it will be no more than 2 – the whole processing will be held within the loop of the first level of recursion.

.NET 2.0

The new implementation features slight changes. Since generalizations have been introduced in .NET 2.0, I will use the generalized version of sorting.

The sorting method itself looks in the following way:

Optimization for in-built primitive types still exists despite the availability of generalizations. Thus, primitive types still use native sorting.

Now, the median of the first, middle and last elements of an array is taken as a pivot instead of the array middle.

Besides, now the middle is calculated according to the formula index1 + (index2 — index1 >> 1) that eliminates overflow errors.

The rest has been left intact.

Now, let’s suppose we need to sort an array of integers in descending order. How will you do it?

On my computer, the following code:

works approximately 3 times faster than

You may be confused with the fact that the Array.Reverse method is not generalized, which means that it will work slowly with the value types (packaging and the GetValue, SetValue methods). But if we look at its implementation, we will see optimization for built-in value types. In particular, it will call the Array.TrySZReverse native method that looks in the following way:

In general, we can find generalizations everywhere in the standard library. And it’s strange that there is no a generic version of this method. There is the Reverse method as an extension method of Enumerable, but its disadvantage is that it makes this not on site. So, the call of Array.Reverse on an array of user-defined value types always results in autoboxing.

.NET 3.0 — .NET 4.0

The algorithm has not been changed.

.NET 4.5

Before we start considering the algorithm, I’d like to say a few words about the deployment of .NET 4.5. I highly recommend you to have a look at this article in this regard. When installing VS 2012, i.e. when installing NET 4.5, it replaces assemblies of the 4th framework. In fact, it means that even if you are coding on .NET 4, you are using the .NET 4.5 assemblies. So, before the installation of 4.5, you use one sorting algorithm, and after installation, you use the other algorithm, and it happens without your knowing.

To find out what is happening, let’s have a look at the following code from .NET 4.5:

As you see, the method includes checking the .NET version: if it is 4.5, we use IntrospectiveSort, if it is 4.0, we use DepthLimitedQuickSort.

Let’s find out how DepthLimitedQuickSort differs from the sorting used in .NET 4.0 before the VS 2012 installation. Let’s take a look at the code of the method:

As you can see, it is the same quicksort except for one thing: the algorithm switches to the heapsort if we exhaust recursion depth that equals 22 by default.
The heapsort looks in the following way:

The DepthLimitedQuickSort algorithm is nothing but IntroSort.

Introsort or introspective sort is a hybrid sorting algorithm that provides both fast average performance and (asymptotically) optimal worst-case performance. It begins with quicksort and switches to heapsort when the recursion depth exceeds a level based on (the logarithm of) the number of elements being sorted. This combines the good parts of both algorithms, with practical performance comparable to quicksort on typical data sets and worst-case O(n log n) runtime due to the heap sort. Since both algorithms it uses are comparison sorts, it too is a comparison sort.

Now, let’s look at IntrospectiveSort. In fact, this is the same introspective sorting, but more optimized. By the way, MSDN still states it uses quicksort.

IntroSort


PickPivotAndPartition

InsertionSort

Now sorting in arrays is represented by a mix of sortings: insertion sort, quicksort, and heapsort.
Usage of Introsort has a positive effect on productivity since data can be partially ordered in real tasks. As we all know, the insertion sort works quite fast with such data.

Productivity Comparison

Productivity Comparison

Comparison with Java

Java drastically differs from .NET in terms of sorting. However, similar to .NET, the Java algorithm has also been changing.

As we know, quicksort is unstable, which is a disadvantage for sorting of referenced types. In Java, this problem doubles. That’s why merge sort is used for sorting linked types. This sorting is stable and ensures execution for O(n logn) of time in the worst case. However, it requires O(n) of additional memory.

Since the stability issue refers only to referenced types, changing elements with one key does not matter for primitive types. That’s why, the improved quicksort algorithm, DualPivotQuicksort, is used for sorting primitive types in Java. Normal Quicksort divides an array into two parts by selecting a random element P. Then, it sorts an array in a way so that all its elements which are less than P are located in the first part, while the rest of elements are located in the second part. DualPivotQuicksort divides an array into three parts instead of two. As a result, the number of moves of array elements drastically decreases.

In Java 7, the sorting algorithm for referenced types has been changed to TimSort.

Timsort is a hybrid stable sorting algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data. It was implemented by Tim Peters in 2002. The algorithm finds subsequences of the data that are already ordered and uses that knowledge to sort the remainder more efficiently. This is done by merging an identified subsequence, called a run, with existing runs until certain criteria are fulfilled.

Timesort is fast, but in terms of random data, it loses to the quicksort which is 30-percent faster.

What do you think of this difference of sorting implementation in two frameworks? Do we really need the stability that requires much time (as in Java) and memory for real tasks? Can we prefer speed and memory saving to stability (as in .NET)? Personally, I prefer .NET, since I suppose that stability is required only for certain tasks (I have never used it for last 4 years). Even when stability is highly required, we can put this problem on programmer’s shoulders – I think it is not hard to implement the algorithm of stable sorting.

Conclusion

Perhaps, not every programmer requires so many details on .NET, but still this knowledge never gets in the way. Thank you for reading.

Please feel free to ask any questions and share your thoughts in the comments section.

Timur Guev

Timur Guev

Timur is an experienced C# developer. For last three years, Timur has been developing the KSS (Kaspesky Subscription Service) highload system with the C#, SQL Server,and Azure technologies. At loose hours, Timur is teaching mathematics.
Timur Guev

Latest posts by Timur Guev (see all)

Timur Guev

Timur is an experienced C# developer. For last three years, Timur has been developing the KSS (Kaspesky Subscription Service) highload system with the C#, SQL Server, and Azure technologies. At loose hours, Timur is teaching mathematics.

  • Aron

    I love these articles! Very interesting insights into the internals of .NET. Keep em coming!

    • Tim Guev

      thanks!

  • Can be easily combined with cashing

    I assume you mean caching? 😉