.NET Managed + C Unmanaged: What’s the Cost?

When I was programming in C#, I used to send all recursive tasks to an unmanaged C code, since the .NET performance was problematic. And now, looking back at my past experience, I think of the benefits of such code division. Do I really benefit from it, and if yes, how much? What is the best way of building API with such approach?

Why?

Developing a project in two different languages is a very dubious thing. Moreover, the unmanaged code is really difficult in terms of implementation, debugging and maintenance. But a chance to implement the functionality that can work faster is already worth considering, especially when it comes to critical and important areas or high-load applications.

Another possible answer: functionality has been implemented in the unmanaged mode. Why should I rewrite the entire solution, if I can quickly wrap everything into .NET and use it from there?

Lyrics

The code was written in Visual Studio Community 2015. For evaluation, I used my PC with i5-3470, the 12 Gb dual channel 1333MHz RAM, and the 7200 rpm hard disc. Measuring was carried out with the help of System.Diagnostics.Stopwatch that is more precise than DateTime, since it is implemented over PerformanceCounter. Tests were run on the Release variants of assemblies to make the results maximally real. I used .NET 4.5.2, and the C++ project was compiled with the /TC (Compile as C) option enabled.

Calling Functions

I began my research from the evaluation of the function call speed. There are several reasons for this. Firstly, we need to call functions anyways, and functions are called from the loaded dll slowly in comparison with the code in the same module. Secondly, most of existing C#-wrappings over any unmanaged code are implemented in a similar way (for instance, sharpgl, openal-cs). In fact, it is the most obvious and the simplest way to embed the unmanaged code.

Before evaluation, we need to think how to store and evaluate the measurement results. I selected «CSV!» and wrote a simple class for storing data in this format:

It can hardly be considered as a functional variant, but it perfectly suits my needs. For testing, I wrote a simple class that can only sum numbers and store the results. Here is how it looks like:

But it is a managed variant, and we need an unmanaged one as well. So, I created a template dll project and added a file to it, for instance, api.h with the export definition:

Let’s put the summer.c class next to it and implement all functionality we need:

Now, we need a wrapper class over this mess:

As a result, we got exactly what we wanted. There are two implementations that are similar in terms of usage: one is in C#, and the second one is in C. Now, we can look at them, and check the results! Let’s write a code for measuring execution time for n calls of the same class:

The only thing left is to call this function somewhere from main and take a look at fun_call.csv. I don’t want to bother you with numbers, so here is a chart with time in ticks (in the vertical direction) and the number of the function calls (in the horizontal direction).

The result surprised me a bit. C# was a leader of this test. However, in spite of the same moduleand the possibility to inline, both variants turned out to be quite similar. In this particular case, code division turned out to be useless – we didn’t benefit from it at all, and complicated the project.

Arrays

After a short analysis of the results, I understood that I need to send data not by one element, but in arrays. So, it’s time to update the code. Let’s add the functionality:

And here is the C part:

So, I had to rewrite the performance evaluation function. The full version is provided below. In few words, now we generate an array of n random elements and call the function to add them.

Now, let’s run it and check the report. Time in ticks is in the vertical direction, and the number of the array elements is in the horizontal direction.

Obviously, C copes with a simple array processing much better. But it is the cost of manageability. While managed code will throw an exception in the case of overflow or violation of array bounds, C can simply rewrite not its own memory.

File Read

When it became clear that C could process big arrays a way faster, I decided that files must be read. I wanted to check the speed of code interaction with the system.

For this, I generated a stack of files (which size grows linearly).

As a result, the largest file had the size of 75 Mb, which is quite ok. For testing, I didn’t create a separate class and wrote code right in the main class.

As you can see from the code, the task was to sum up all integers from the file.

Here is the corresponding implementation in C:

Now, we only need to read all files cyclically and change the work speed for each implementation. Here is a chart with the results.

As you can see from the chart, C turned out to be much faster (approximately, by half).

Return of Arrays

The next step of measuring performance was the return of more complex types, since usage of integers and floating point numbers is not always convenient. So, we need to check the speed of transformation of unmanaged memory areas to managed ones. For this, I decided to implement a simple task: read of the entire file and return of its content as a byte array.

On a clean C#, such task can be implemented in a pretty simple way, but linking of the C code with the C# code in this case required extra thinking.

Firstly, a solution in C#:

And the corresponding solution in C:

For a successful call of such function from C#, we need to write a wrapper that calls this function, copies data from the unmanaged memory to the managed one and frees the unmanaged area:

As for the measurement functions, only the corresponding calls of the measured functions were changed. The result looks as follows:

Even with the time spent on memory copying, C again became a leader and executed the task twice faster. Frankly speaking, I expected a bit different results (taking into account the second test). The reason is that data read in C# takes much time. As for C, time is spent on copying from the unmanaged memory to the managed one.

Real Task

A logical conclusion of all tests I performed was to implement the full-featured algorithm in C# and C and to evaluate performance.

As an algorithm, I used the read of the uncompressed TGA file with 32 bits per pixel and its transformation into the RGBA view (the TGA format supposes storing of color as BGRA). In addition, we will return not bytes, but the Color structures.

Implementation of the algorithm is quite heavy and is hardly interesting.

And here is a C variant:

Now, we need to draw a simple TGA image and load it n times. The result is the following (time in ticks is in the vertical direction, and the number of file reads is in the horizontal direction):

Note that I intentionally used the features of C in its favor. The read from file right to the structure has made my life much easier (in the cases, when structures are not aligned by 4 bytes, debugging will be really painful). However, I’m happy with the result. I managed to successfully implement a simple algorithm on C and then use it in C# with success. Thus, I got an answer to my initial question, we can really win, but not always. Sometimes, we can win a little, sometimes we cannot win at all, and sometimes, we can win a lot.

Conclusion

The idea to pass implementation of something to another language is questionable, as I wrote at the very beginning. After all, you can hardly find many ways to implement this speed-up method. If the file opening freezes UI, you can send loading to a separate background tread, and then even a 1-second loading will not cause any troubles.

Correspondingly, it is worth the cost only when performance is really in need, and you cannot improve in any other way. Or, if there is a ready-to-use algorithm.

Note, that a simple wrapping over the unmanaged dll will not improve performance greatly, and all speed of the unmanaged languages can be revealed only during processing of large volumes of data.

C# perfectly copes with passing manageable resources to the unmanaged code, but the reverse change takes too much time. That is why it is better to avoid frequent data conversion and keep unmanageable resources in the unmanaged code. If there is no need in data modification/read in the managed code, you can use IntPtr for storing pointers and pass the rest to the unmanaged code.

Of course, we can (and should) make additional research before taking the final decision on passing code to the unmanaged assemblies. But the current information allows deciding, whether such actions are viable or not.

That’s it. Thank you for reading.

The article was translated by the CodingSight team with the permission of the author.

  • Orion Edwards

    There’s a bunch of things you can do in C# to make it “unsafe” like C – e.g. using unchecked(), and unsafe functions using pointers, and so on and so forth. Likewise I think you could probably improve the performance of C# by using structs in place of classes in key places.

    I feel like all of those things would be a better first-step to do for performance than P/invoking through to C… The overhead of having multiple projects and having to manage dllimports and stuff is significant!