Saturday, March 12, 2011

Software occlusion culling pt.2: Implementation

In the previous post i’ve described occlusion culling technique that tries to take tad different steps on resolving visibility information. In this one i’m gonna share basic implementation that performs pretty well. Following steps are performed:

  • Transform objects (occluder / test geometry) bounding boxes to viewport space. Determine whether we should clip its triangles or not and tile bounds that object is in. Invisible objects are skipped. Also calculate view space Z bounds that later will used as sort key.
  • Transform vertices ( for objects intersecting frustum planes do transformation and clipping ) to viewport space.
  • Allocate output triangles and in each tile that triangle intersects store allocated triangle indices. We choose to store triangles in groups of four in SoA style. Also the backface mask is calculated during this step so backfacing triangles could be skipped in rasterizer
    Now as we have triangle lists for each tile we sort them using previously stored Z value.
  • Sorted lists are drawn, where occluder triangle scanlines are merely ORed with corresponding scanline of occlusion buffer of tile and test triangles are checked whether any pixels of scanline are visible.

Performance of the rasterizer is greatly affected by the resolution. In the demo i use 720p which can be easily reduced. Triangle list sorting can be expensive if all triangles are in same tiles. Also it could be beneficial to split tiles vertically and have masks for each tile so we can quickly check if tile is already full. I’m currently working on MT implementation and performance is scaled linearly with number of working threads just as expected. Though it seems making fast and reliable job manager on Windows is not that easy but that’s completely different topic.

Demo with source code


Matt said...

Have you tried using tbb for scheduling?

kas said...

not yet, but its on my todo. i've tried naive job manager but it doesn't scale well. i could only get 2x speedup. i'm gonna try TBB and OMP later. right now i'm thinking how to speed up transform stage as it is bottleneck