Optimising a GLSL shader (notes)

The following are a list of potential optimisations for shaders:

Do as little work as possible:
  • Don't continually calculate something that is essentially static, pass the value to the shader using a uniform.
  • If you have to calculate something every frame store the result in a uniform or a texture so the result is available to any shader that has to be run on every vertex or fragment.
  • If you need to calculate a lot of values at the start of the frame consider doing this using a shader and store the results in a texture.
  • Don't create and destroy resources.
  • Set the data as little as possible.
Keep shaders simple and as small as possible:
  • Don't create large all encompassing shaders. These take time to compile and optimise. Use the shader pre-processor to switch in different behaviors. I implemented an '#include' concept for my shaders.
  • Consider using ARB_separate_shader_objects.
  • Consider using shader sub-routines - in Mesa this is usually implemented using a 'if then else' construct with a uniform controlling the program flow. With 'Mesa >= 16', NVIDIA/AMD I've not seen any impact on performance using sub-routines with sensibly sized shaders.
  • Because shader sub-routines may result in a large all encompassing shader as a side effect of the target implementation you have to validate that this will not impact your application.
  • NVIDIA/AMD and Mesa 16.x (i965 driver) and newer I would use shader sub-routines. Older versions of Mesa or when using a different driver I would be careful and test.
  • Turn on OpenGL debugging and look at the output produced. Mesa will at least warn you about registry spills which basically means that your shader is a bit too big or that there is a problem in the compiler/optimiser.
Math:
  • Avoid iterative algorithms. Consider taking a different approach to performing a calculation such as using a min/max polynomial. Don't use a Taylor Series.
  • Know your sin/cos/tan/... properties and relationships.
  • Consider what GLSL functions are available and use the appropriate one, e.g. inversesqrtfma, etc...
  • Don't calculate something unless you have too.
  • Consider re-arranging the algorithm to reduce the number of operations or expensive math calls.
  • GLSL is close enough to C so test on the CPU and use a performance analysis tool.
  • Avoid using division.
  • Consider using MAD instruction.
  • Consider evaluation order of operators.
  • Keep floats together and vec's together.
  • pow(x, y) can be implemented as:
exp2(log2(x) * y)
  • exp(x) can be implemented as:
exp2(x * 1.442695)
  • log(x) can be implemented as:
log2(x * 0.695147)
  • sign() if we don't care about zero.
(x>0) ? 1 : -1
  • sign(x) * y can be implemented as
(x >= 0) ? y : -y

It is reported that conditional assignment is faster than calling sign().
  • a more accurate atan can be implemented as:
float ATAN(in float y) {
  bool s = (1.0 > abs(y));
  if (s)
    return M_PI_2 - atan(1.0, y);
  return atan(y, 1.0);
}
  • atan2 may not be as robust as you think see Stackoverflow robust atan2
  • pow is known to have accuracy issues on i965 - I can vouch for this.
Debugging:
  • Hook up all the debugging extensions. You will be amazed at how much information most OpenGL drivers will provide to you.
  • For math algorithms implement on the CPU first.
  • If any of the standard math library functions fail to work then look at the symptoms and try and first work out what the pattern for the error is. Based on this you can potentially narrow down the issue to a handful of functions. Implement each function in turn using soft float implementation.
  • If you are really stuck on this then I am available for contract work (damian dot dixon @ acm dot org).
Code flow:
  • Avoid if and while statements.
  • Use the preprocessor to deal with code flow if at all possible.

pragma in a shader

There is no defined list.

Each implementation has it's own set.

Some common ones are:
  • #pragma debug(on)
  • #pragma debug(off)
  • #pragma optimize(on)
  • #pragma optimize(off)
  • #pragma invariant(all)
NVIDIA specific pragmas:
  • #pragma optionNV(fastmath on)
  • #pragma optionNV(fastprecision on)
  • #pragma optionNV(ifcvt none)
  • #pragma optionNV(inline all)
  • #pragma optionNV(strict on)
  • #pragma optionNV(unroll all)
There are also some environment variables that can be defined (options typically controlled globally with NVEmulate):
  • __GL_WriteProgramObjectAssembly
  • __GL_WriteProgramObjectSource
  • __GL_WriteInfoLog
  • __GL_ShaderPortabilityWarnings

Discussion

Some of the above optimisations use features that are not available on all OpenGL implementations.

You should consider what your lowest common denominator is and program to that level. You can then check for additional extension support and enable optimisations as needed.

This approach will ensure that your application works on the majority of hardware that is circulation. In the business or defense world people usually have some awful hardware that can't be changed.

Additional References:

Comments