As we are already familiar with the basics of memory and data structures used by .NET applications, in this third post from .NET Internals series we’re going to dig into boxing and unboxing and their performance implications.
What is boxing and unboxing?
In the previous post we learnt what are value and reference types and got to know that the former are stored on the stack, whereas the latter are stored on the managed heap. So why should we care? Isn’t it .NET runtime which correctly manages these data structures and what’s stored on them so we don’t need to worry about that?
In fact, no. What’s crucial to know and understand are the implications of moving data from the stack to the heap and otherwise.
Remember:
- when any value type variable is assigned to a reference type variable, data is moved from the stack to the heap and this is called boxing,
- when any reference type variable is assigned to a value type variable, data is moved from the heap to the stack and this is called unboxing.
Microsoft Docs examples illustrate these actions very well.
Consider the following example of boxing:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
int i = 123; | |
// Boxing – copying the value of i into object o. | |
object o = i; |
and the memory state as it executes:
In order to store “123” value in an object, the “box” is created on the heap and the value is copied inside it.
On the other hand, when unboxing happens:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
int i = 123; // "i" is a value type | |
object o = i; // boxing "i" into "o" | |
int j = (int)o; // unboxing "o" into "j" |
that’s how stack’s and heap’s content changes:
Value “123” is taken out of the “box” and placed back on the stack.
Notice that when i value type is boxed into object o, there’s a reference stored on the stack and the actual memory is allocated on the heap (as for all reference types). As soon as the unboxing happens, the real data being on the heap must be copied to the stack (variable j). In both cases our goal is to deal with the same value (123).
As you can imagine, these operations produce some additional cost and affect the performance, which we’ll discuss in a moment.
Let’s see some IL
When analyzing such performance or memory management aspects in our code (C# in that case), it’s often worth to see how the Intermediate Language (IL) looks like.
We haven’t covered this concept yet, but as you probably know when C# code is compiled into DLL or EXE, these output files actually contain the IL code, which is later JIT-compiled and executed by the virtual machine (more details here). The .NET runtime must somehow know whether is should box or unbox the particular variable, as it requires some special memory allocation actions to be taken.
Let’s create some simple .NET Console Application with the following code in its Main method:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
namespace BoxingUnboxingTest | |
{ | |
class Program | |
{ | |
static void Main(string[] args) | |
{ | |
int a = 5; | |
object o = a; | |
int b = (int)o; | |
} | |
} | |
} |
Let’s compile the application so we can find BoxingUnboxingTest.exe file in the output directory. Now we will use ILSpy to see the IL code inside the executable.
As soon as the EXE file is opened in ILSpy, we can go directly to see Main(string[]) : void method’s compiled content, choosing “IL with C#” view to make it simpler for us:
Notice the box statement just after the assignment of value type to reference type happens (object obj = num). Similarly for unbox.any statement just after assignment of reference type to value type (int num2 = (int)obj).
That’s how boxing and unboxing is represented in the IL.
When do we box and unbox?
The sample code above may seem naive and you may think “hey, but I never do such things”. That’s true in most cases, but values in our code are often boxed/unboxed even if we are not aware of that.
Non-generic collections
For instance, there still exists the old-school ArrayList collection:
which as you can see above has the Add method taking an object parameter. It means that when we’d like to add an integer to ArrayList:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ArrayList al = new ArrayList(); | |
int i = 8; | |
al.Add(i); |
boxing takes place:
Such issues were eliminated by generics and generic collections.
String concatenation
Another interesting example is the concatenation of strings with value types using “+” operator:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
int i = 8; | |
string helloText = "Hello"; | |
string result = helloText + i; |
Such operation involves String.Concat method’s version taking two object parameters, so it implies boxing the integer first:
In order to avoid it, it’s enough to slightly modify the code by using ToString() method on the integer variable (for now ignoring ReSharper telling you it’s redundant ;)):
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
int i = 8; | |
string helloText = "Hello"; | |
string result = helloText + i.ToString(); |
and there’s no boxing anymore, as now String.Concat method’s version taking two string parameters is used:
There could be many more cases to present, but the goal is that you get a strong feeling what boxing and unboxing is and when it occurs.
Performance implications of boxing and unboxing
As we already know, boxing and unboxing imply some cost. In case of simple stuff like concatenating the string once with some integer, the performance gain by introducing ToString on integer first is unnoticeable. That’s why, as I wrote before, even ReSharper tells you to not do it:
In that case it’s better to keep the clarity of the code without ToString method.
The perspective changes when you need to perform such operations in a loop for hundreds or thousands times. In this case the execution of code using boxing can be even 150% longer than its equivalent with no boxing (you can create a simple application and measure the execution time of code with and without boxing or check this article).
Boxed values also take more space in the memory than value types stored on the stack. Copying the value to/from the stack is also a cost. According to MSDN, boxing can generally take even 20 times longer that simple reference assignments, whereas unboxing can be 4 times slower than assignments.
So… why to use boxing and unboxing?
Despite all performance implications boxing and unboxing have, these concepts were introduced into .NET because of some reasons:
- there’s unified types system in .NET, which allows to “represent” both value and reference types in a similar way – thanks to boxing,
- collections could be used for value types before generics were introduced into .NET,
- it simplifies our code, like we saw for string concatenation and in most cases this clarity gives us much more than the performance we’d gain trying to avoid boxing.
Boxing and unboxing are so common that we can’t avoid them. We should be aware how it works in order to be able to minimize their usage, but it should always be considered reasonably. Don’t spend time on optimizing your code, checking IL all the time to get to the point with the smallest possible number of box statements used. Keep in mind that the clarity of your code, its self-explanatory structure and comfort of reading is sometimes much more valuable than some small or barely noticeable performance gain.
Summary
In today’s post we saw what is boxing and unboxing, how it’s represented in the IL code and what implications on performance it may have. I hope it clarifies these commonly mentioned concepts for you, at least a bit 😉
If you’re interested in digging into nitty-gritty, low-level details of boxing and unboxing in .NET, I invite you to read this great post by Matt Warren about even more internal internals of boxing and unboxing 🙂
In the next posts in the series we’re going to start exploring the garbage collection. If you have any suggestions or ideas for the coming articles, let me know in the comments (some of you already did on reddit – new posts scheduled 🙂 ).
See you (or better read you 😉 ) next week!
Nice explanation. Keep it up.
Thanks!
Very clear ! Thx
You’re welcome!
Hi, David. Thanks for the posts, lovely topics, and level of explanations.
Could you also add a post where you explain how the size of an object is measured?
Hey Jelena,
I’m glad you find them useful 😉 Sure, thanks for noticing that topic, I’ll try to mention it in one of the coming posts on garbage collection.
Take care,
Dawid.
Thank you for the excellent post.
Glad you liked it!