C/C++ Code Optimization Techniques
Rules to Remember
- Find the bottleneck of your Application / module. 80 - 20 rule
also applies over here which means most of the execution goes
in a smaller part of the code. You should use some profiling tool
to fing the bottle nect of your application before going into
the optimization part.
- C/C++ compilers are queit efficient and they procuce optimized
binary for the written code. So micro-optimization like using
register variables will not be usesful. Let the compiler find
the best way to utilize the variables.
- Some guidelines for optimizing C/C++ code:
a. Get same calculations/comparisons out of the loop.
for (int i=0; i<iMax; i++)
{
if (Val > CONST_VAL)
{
...
}
else
{
....
}
}
This can be optimized as:
if (Val > CONST_VAL)
{
for (int i=0; i<iMax;
i++)
{
.....
}
}
else
{
for (int i=0; i<iMax;
i++)
{
.....
}
}
b. Avoid type conversions, use same type of variables
including signed and unsigned.
unsigned long x, sum;
for (int i = 0; i<Max_I; i++)
{
sum = x*i;
}
Optimized:
unsigned long x, sum;
for (unsigned long i = 0; i<Max_I; i++)
{
sum = x*i;
}
c. Interger calculations are faster than floating point.
d. Keep in mind the processors word length. On 32 bit
processor 32-bit calculations and copying will be faster or equal
to 16-bit or 8-bit.
e. Interger Multiply/Divide by two is fastest using bit
shift operations.
f. Multiplication and Division operations are generally
costlier than Addition Substraction. Try to convert multiplication/division
to addition/substraction.
An Example:
Before
These are constants to the loop
extentY, extentX, TopLeftX, TopLeftY and are of type unsigned long
#define Target(cx,cy) pOffBits[serverResX * (cy) + (cx)]
#define Source(cx,cy) pBitsI[extentX * (cy) + (cx)]
for (int y = 0; y<extentY; y++)
for (int x = 0; x<extentX;
x++)
{
Target(x
+ TopLeftX, y + TopLeftY) = Source(x,y);
}
Optimized
unsigned long intermC = serverResX * TopLeftY + TopLeftX;
unsigned long intermZ = 0;
unsigned long intermY = 0;
for (unsigned long y = 0; y<extentY; y++)
{
unsigned long intermA = intermY
+ intermC;
for (unsigned long x = 0;
x<extentX; x++)
{
pOffBits[intermA
+ x] = pBitsI[intermZ + x];
}
intermZ += extentX;
intermY += serverResX;
}
Step By Step
1. Replacing the #define and making the loop variable data type
same as that of the others
for (unsigned long y = 0; y<extentY; y++)
for (unsigned long x = 0; x<extentX; x++)
{
pOffBits[serverResX * (y + TopLeftY)
+ (x + TopLeftX)] = pBitsI[extentX * y + x];
}
2. Rearranging the calculation
pOffBits[serverResX * y + x + serverResX * TopLeftY + TopLeftX]
= pBitsI[extentX * y + x];
Here (serverResX * TopLeftY + TopLeftX) is constant so we
can take it out of the loop
unsigned long intermC = serverResX * TopLeftY + TopLeftX;
for (unsigned long y = 0; y<extentY; y++)
for (unsigned long x = 0; x<extentX;
x++)
{
pOffBits[serverResX
* y + x + intermC] = pBitsI[extentX * y + x];
}
3. Now here both serverResX * y and extentX * y are
constant in the inner loop so we can take it out of the innder loop.
unsigned long intermC = serverResX * TopLeftY + TopLeftX;
for (unsigned long y = 0; y<extentY; y++)
{
unsigned long intermA = serverResX
* y;
unsigned long intermB = extentX
* y;
for (unsigned long x = 0; x<extentX;
x++)
{
pOffBits[intermA
+ x + intermC] = pBitsI[intermB + x];
}
}
4. Now you can see the multiplication serverResX * y going
in the loop. Where y ranges from 0 to extentY. We can convert it
to addition like this:
unsigned long intermC = serverResX * TopLeftY + TopLeftX;
unsigned long intermY = 0;
for (unsigned long y = 0; y<extentY; y++)
{
unsigned long intermA = intermY;
unsigned long intermB = extentX
* y;
for (unsigned long x = 0; x<extentX;
x++)
{
pOffBits[intermA
+ x + intermC] = pBitsI[intermB + x];
}
intermY += serverResX;
}
5. Similerly for extentX, and we gan get rid of intermA
and intermB
unsigned long intermC = serverResX * TopLeftY + TopLeftX;
unsigned long intermZ = 0;
unsigned long intermY = 0;
for (unsigned long y = 0; y<extentY; y++)
{
for (unsigned long x = 0; x<extentX;
x++)
{
pOffBits[intermY
+ x + intermC] = pBitsI[intermZ + x];
}
intermZ += extentX;
intermY += serverResX;
}
6. Also since intermY + intermC is constant in the inner
loop we can take it out
unsigned long intermC = serverResX * TopLeftY + TopLeftX;
unsigned long intermZ = 0;
unsigned long intermY = 0;
for (unsigned long y = 0; y<extentY; y++)
{
unsigned long intermA = intermY
+ intermC;
for (unsigned long x = 0; x<extentX;
x++)
{
pOffBits[intermA
+ x] = pBitsI[intermZ + x];
}
intermZ += extentX;
intermY += serverResX;
}
This is the optimized algorithm.
Note: All datatypes are same over here
Comments, questions, please mail me.
Akhilesh Singh
Last Updated 8th January 2005
|