How does the C# compiler optimize a code fragment?

If I have a code like this

for(int i=0;i<10;i++)
{
    int iTemp;
    iTemp = i;
    //.........
}

Does the compiler instantinate iTemp 10 times?

Or it optimize it?

I mean if i rewrite the loop as

int iTemp;
for(int i=0;i<10;i++)
{
    iTemp = i;
    //.........
}

Will it be faster?

Answers


Using reflector you can view the IL generated by the C# compiler.

.method private hidebysig static void Way1() cil managed
{
    .maxstack 2
    .locals init (
        [0] int32 i)
    L_0000: ldc.i4.0 
    L_0001: stloc.0 
    L_0002: br.s L_0008
    L_0004: ldloc.0 
    L_0005: ldc.i4.1 
    L_0006: add 
    L_0007: stloc.0 
    L_0008: ldloc.0 
    L_0009: ldc.i4.s 10
    L_000b: blt.s L_0004
    L_000d: ret 
}

.method private hidebysig static void Way2() cil managed
{
    .maxstack 2
    .locals init (
        [0] int32 i)
    L_0000: ldc.i4.0 
    L_0001: stloc.0 
    L_0002: br.s L_0008
    L_0004: ldloc.0 
    L_0005: ldc.i4.1 
    L_0006: add 
    L_0007: stloc.0 
    L_0008: ldloc.0 
    L_0009: ldc.i4.s 10
    L_000b: blt.s L_0004
    L_000d: ret 
}

They're exactly the same so it makes no performance difference where you declare iTemp.


As others have said, the code you've shown produces equivalent IL, except when the variable is captured by a lambda expression for later execution. In that case the code is different as it must keep track of the current value of the variable for the expression. There may be other instances where the optimization doesn't take place as well.

Creating a fresh copy of the loop variable is a common technique when you want to capture the value for a lambda expression.

Try:

var a = new List<int> { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

var q = a.AsEnumerable();
int iTemp;
for(int i=0;i<10;i++) 
{ 
    iTemp = i;
    q = q.Where( x => x <= iTemp );
}

Console.WriteLine(string.Format( "{0}, count is {1}",
    string.Join( ":", q.Select( x => x.ToString() ).ToArray() ),
    q.Count() ) );

and

var a = new List<int> { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

var q = a.AsEnumerable();
for(int i=0;i<10;i++) 
{ 
    var iTemp = i;
    q = q.Where( x => x <= iTemp );
}

Console.WriteLine(string.Format( "{0}, count is {1}",
    string.Join( ":", q.Select( x => x.ToString() ).ToArray() ),
    q.Count() ) );

If you're really curious about how CSC (the C# compiler) treats your code, you might want to play with LINQPad- it allows you to, among other things, enter short C# expressions or programs and take a look at the resulting IL (CLR bytecode).


One thing to keep in mind is that local variables are typically allocated on the stack. One task that a compiler must do is figure out how much stack space a particular method requires and set that aside.

Consider:

int Func(int a, int b, int c)
{
    int x = a * 2;
    int y = b * 3;
    int z = c * 4;
    return x + y + z;
 }

Ignoring the fact that this can be easily optimized to be return (a * 2) + (b * 3) + (c * 4), the compiler is going to see three local variables and set aside room for three local variables.

If I have this:

int Func(int a, int b, int c)
{
    int x = a * 2;
    {
        int y = b * 3;
        {
            int z = c * 4;
            {
                return x + y + z;
            }
        }
     }
 }

It's still the same 3 local variables - just in different scopes. A for loop is nothing but a scope block with a little glue code to make it work.

Now consider this:

int Func(int a, int b, int c)
{
    int x = a * 2;
    {
        int y = b * 3;
        x += y;
    }
    {
        int z = c * 4;
        x += z;
    }
    return x;
}

This is the only case where it could be different. You have variables y and z which go in and out of scope - once they are out of scope, the stack space is no longer needed. The compiler could choose to reuse those slots such that y and z share the same space. As optimizations go, it's simple but it doesn't gain much - it saves some space, which might be important on embedded systems, but not in most .NET applications.

As a side note, the C# compiler in VS2008 in release isn't even performing the simplest strength reductions. The IL for the first version is this:

L_0000: ldarg.0 
L_0001: ldc.i4.2 
L_0002: mul 
L_0003: stloc.0 
L_0004: ldarg.1 
L_0005: ldc.i4.3 
L_0006: mul 
L_0007: stloc.1 
L_0008: ldarg.2 
L_0009: ldc.i4.4 
L_000a: mul 
L_000b: stloc.2 
L_000c: ldloc.0 
L_000d: ldloc.1 
L_000e: add 
L_000f: ldloc.2 
L_0010: add 
L_0011: ret 

whereas, I fully expected to see this:

L_0000: ldarg.0 
L_0001: ldc.i4.2 
L_0002: mul 
L_0003: ldarg.1 
L_0004: ldc.i4.3 
L_0005: mul 
L_0006: add 
L_0007: ldarg.2 
L_0008: ldc.i4.4 
L_0009: mul 
L_000a: add 
L_000b: ret 

The compiler will do the optimisation you've shown for you.

It's a simple form of loop hoisting.


A lot of people have provided you IL to show you that your two code fragments are effectively the same from a performance perspective. It's not really necessary to go to that level of detail to see why this is the case. Just think about this from the perspective of the call stack.

Effectively what happens at the beginning of a method containing a code fragment like the two that you provided is that the compiler will emit code to allocate space at the beginning of the method for all locals that will be used within that method.

In both cases what the compiler sees is a local named iTemp so when it allocates space on the stack for the locals it will allocate 32-bits to hold iTemp. It doesn't matter to the compiler that in the two code fragments iTemp have different scope; the compiler will enforce that by just not allowing you to refer to iTemp outside the for loop in the first fragment. What it will do is allocate this space once (at the beginning of the method) and reuse the space as needed during the loop in the first fragment.


The C# compiler doesn't always need to do a good job. The JIT optimizer is tuned for the IL that the C# compiler emits, better looking IL does not (necessarily) produce better looking machine code.

Let's take an earlier example:

static int Func(int a, int b, int c)
{
    int x = a * 2;
    int y = b * 3;
    int z = c * 4;
    return x + y + z;
}

The emitted IL from the 3.5 compiler with optimizations enabled looks like this:

.method private hidebysig static int32  Func(int32 a,
                                             int32 b,
                                             int32 c) cil managed
{
  // Code size       18 (0x12)
  .maxstack  2
  .locals init (int32 V_0,
           int32 V_1,
           int32 V_2)
  IL_0000:  ldarg.0
  IL_0001:  ldc.i4.2
  IL_0002:  mul
  IL_0003:  stloc.0
  IL_0004:  ldarg.1
  IL_0005:  ldc.i4.3
  IL_0006:  mul
  IL_0007:  stloc.1
  IL_0008:  ldarg.2
  IL_0009:  ldc.i4.4
  IL_000a:  mul
  IL_000b:  stloc.2
  IL_000c:  ldloc.0
  IL_000d:  ldloc.1
  IL_000e:  add
  IL_000f:  ldloc.2
  IL_0010:  add
  IL_0011:  ret
} // end of method test::Func

Not very optimal right? I'm compiling it into an executable, calling it from a simple Main method and the compiler isn't inlining it or doing any optimizations really.

So what is happening at runtime?

The JIT is in fact inlining the call to Func() and producing much better code than you might imagine when looking at the stack-based IL up above:

mov     edx,dword ptr [rbx+10h]
mov     eax,1
cmp     rax,rdi
jae     000007ff`00190265

mov     eax,dword ptr [rbx+rax*4+10h]
mov     ecx,2
cmp     rcx,rdi
jae     000007ff`00190265

mov     ecx,dword ptr [rbx+rcx*4+10h]
add     edx,edx
lea     eax,[rax+rax*2]
shl     ecx,2
add     eax,edx
lea     esi,[rax+rcx]

Need Your Help

Image.Save(..) throws a GDI+ exception because the memory stream is closed

c# image exception gdi+

i've got some binary data which i want to save as an image. When i try to save the image, it throws an exception if the memory stream used to create the image, was closed before the save. The reaso...

Open a URL without using a browser from a batch file

batch-file cmd

I want to open a particular URL without opening directly the browser using only a batch file. I know i can use something like: