Crossroad >> Code >> Article
Google
 
Web akhilesh.in
Home
 » Articles
 » PalmOS



C/C++ Code Optimization Techniques


Rules to Remember
  1. Find the bottleneck of your Application / module. 80 - 20 rule also applies over here which means most of the execution goes in a smaller part of the code. You should use some profiling tool to fing the bottle nect of your application before going into the optimization part.

  2. C/C++ compilers are queit efficient and they procuce optimized binary for the written code. So micro-optimization like using register variables will not be usesful. Let the compiler find the best way to utilize the variables.

  3. Some guidelines for optimizing C/C++ code:

    a. Get same calculations/comparisons out of the loop.


    for (int i=0; i<iMax; i++)
    {
          if (Val > CONST_VAL)
          {
                ...
          }
          else
          {
                ....
          }
    }


    This can be optimized as:


    if (Val > CONST_VAL)
    {
          for (int i=0; i<iMax; i++)
          {
              .....
           }
    }
    else
    {
          for (int i=0; i<iMax; i++)
          {
                .....
          }
    }

b. Avoid type conversions, use same type of variables including signed and unsigned.

unsigned long x, sum;
for (int i = 0; i<Max_I; i++)
{
      sum = x*i;
}

Optimized:

unsigned long x, sum;
for (unsigned long i = 0; i<Max_I; i++)
{
      sum = x*i;
}


c. Interger calculations are faster than floating point.

d. Keep in mind the processors word length. On 32 bit processor 32-bit calculations and copying will be faster or equal to 16-bit or 8-bit.


e. Interger Multiply/Divide by two is fastest using bit shift operations.


f. Multiplication and Division operations are generally costlier than Addition Substraction. Try to convert multiplication/division to addition/substraction.


An Example:
Before

These are constants to the loop
extentY, extentX, TopLeftX, TopLeftY and are of type unsigned long

#define Target(cx,cy) pOffBits[serverResX * (cy) + (cx)]
#define Source(cx,cy) pBitsI[extentX * (cy) + (cx)]

for (int y = 0; y<extentY; y++)
      for (int x = 0; x<extentX; x++)
      {
            Target(x + TopLeftX, y + TopLeftY) = Source(x,y);
      }


Optimized

unsigned long intermC = serverResX * TopLeftY + TopLeftX;
unsigned long intermZ = 0;
unsigned long intermY = 0;
for (unsigned long y = 0; y<extentY; y++)
{
      unsigned long intermA = intermY + intermC;
      for (unsigned long x = 0; x<extentX; x++)
      {
            pOffBits[intermA + x] = pBitsI[intermZ + x];
      }
      intermZ += extentX;
      intermY += serverResX;
}

Step By Step
1. Replacing the #define and making the loop variable data type same as that of the others

for (unsigned long y = 0; y<extentY; y++)
for (unsigned long x = 0; x<extentX; x++)
{
      pOffBits[serverResX * (y + TopLeftY) + (x + TopLeftX)] = pBitsI[extentX * y + x];
}

2. Rearranging the calculation

pOffBits[serverResX * y + x + serverResX * TopLeftY + TopLeftX] = pBitsI[extentX * y + x];


Here (serverResX * TopLeftY + TopLeftX) is constant so we can take it out of the loop

unsigned long intermC = serverResX * TopLeftY + TopLeftX;
for (unsigned long y = 0; y<extentY; y++)
      for (unsigned long x = 0; x<extentX; x++)
      {
            pOffBits[serverResX * y + x + intermC] = pBitsI[extentX * y + x];
      }


3. Now here both serverResX * y and extentX * y are constant in the inner loop so we can take it out of the innder loop.

unsigned long intermC = serverResX * TopLeftY + TopLeftX;
for (unsigned long y = 0; y<extentY; y++)
{
      unsigned long intermA = serverResX * y;
      unsigned long intermB = extentX * y;

      for (unsigned long x = 0; x<extentX; x++)
      {
            pOffBits[intermA + x + intermC] = pBitsI[intermB + x];
      }
}


4. Now you can see the multiplication serverResX * y going in the loop. Where y ranges from 0 to extentY. We can convert it to addition like this:

unsigned long intermC = serverResX * TopLeftY + TopLeftX;
unsigned long intermY = 0;

for (unsigned long y = 0; y<extentY; y++)
{
      unsigned long intermA = intermY;
      unsigned long intermB = extentX * y;

      for (unsigned long x = 0; x<extentX; x++)
      {
            pOffBits[intermA + x + intermC] = pBitsI[intermB + x];
      }

      intermY += serverResX;
}


5. Similerly for extentX, and we gan get rid of intermA and intermB

unsigned long intermC = serverResX * TopLeftY + TopLeftX;
unsigned long intermZ = 0;
unsigned long intermY = 0;

for (unsigned long y = 0; y<extentY; y++)
{

      for (unsigned long x = 0; x<extentX; x++)
      {
            pOffBits[intermY + x + intermC] = pBitsI[intermZ + x];
      }

      intermZ += extentX;
      intermY += serverResX;
}


6. Also since intermY + intermC is constant in the inner loop we can take it out

unsigned long intermC = serverResX * TopLeftY + TopLeftX;
unsigned long intermZ = 0;
unsigned long intermY = 0;

for (unsigned long y = 0; y<extentY; y++)
{
      unsigned long intermA = intermY + intermC;

      for (unsigned long x = 0; x<extentX; x++)
      {
            pOffBits[intermA + x] = pBitsI[intermZ + x];
      }

      intermZ += extentX;
      intermY += serverResX;
}


This is the optimized algorithm.
Note: All datatypes are same over here


Comments, questions, please mail me.

Akhilesh Singh
Last Updated 8th January 2005


Crossroad | About Me | Science | Life | Code | Contact | Site Map | Search
Copyright © 2006 - 2009 akhilesh.in