The Ziggurat Algorithm is a method for efficient random sampling from a probability distribution such as the Gaussian distribution. The probability distribution function (pdf) for the Gaussian distribution is given by:
Where μ is the distribution mean and σ is the standard deviation.
Internally the sampling algorithm uses a simplified pdf by using a mean of 0 and a standard deviation (sd) of 1, which allows those terms to be factored out of the the pdf. Sampling from a distribution with a different mean and/or sd is then a trivial manipulation of the generated values. The pdf is also simplified further by denormalising (factoring out constant terms) See Optimisations #2 (below).
The Ziggurat algorithm is a hybrid method that obtains its efficiency principally by using rejection sampling with some less efficient calculations performed for less commonly executed corner cases.
Rejection sampling is a method by which uniformly distributed random samples are selected or rejected such that the selected samples have a particular desired distribution. For example in the case of a Gaussian pdf we could define a rectangular area over the main body of the Gaussian pdf (ignoring the distribution tails for a moment) and randomly generate (x,y) points within the square. We now reject all points that are above the pdf curve, i.e. where y > f(x). We can then yield the x coordinate of each of the accepted points to generate random Gaussian numbers. It is easy to see that the proportion with which points with a specific x coordinate are accepted is proportional to f(x) and therefore uniformly generated x coordinates are filtered to yield Gaussian distributed values.
The naive rejection method described above has two main sources of inefficiency.
The Ziggurat algorithm addresses these two issues by covering the pdf with a series of horizontal rectangles rather than a single square, and in an arrangement that attempts to cover the pdf as efficiently as possible, i.e with minimum area outside of the pdf curve. The following diagram demonstrates the approach. Note that we operate on one side of the pdf (x >= 0), generating both positive and negative sample values requires that that as a final step we randomly flip the sign of the generated non-negative values.
Sampling from all other rectangles/segments other then R0/S0. Randomly select x from within R[i]. If x is less than x[i+1] then x is within the curve, return x.
If x is greater than or equal to x[i+1] then generate a random y variable from within R[i] (this amounts to producing a high resolution y coordinate, a refinement of the low resolution y coord we effectively produced by selecting a rectangle/segment).
If y < f(x) then return x, otherwise we reject the sample point and return to step 1. We specifically do *not* re-attempt to sample from R[i] (see special notes 1).
Segments have equal area and are thus selected with equal probability. However, the area under the distribution curve covered by each segment/rectangle differs due to the varying shape of the pdf where the rectangles overlap its edge. Thus it could be suggested that to avoid sampling bias that the segments should be selected with a probability that reflects the area of the distribution curve they cover not their total area, this is an incorrect approach for the algorithm as described above. To explain why consider an extreme case...
Say that rectangle R1 covers an area entirely within the distribution curve, now consider that R2 covers an area only 10% within the curve. Both rectangles are chosen with equal probability, thus the argument is that R2 will be 10x overrepresented (will generate sample points as often as R1 despite covering a much smaller proportion of the area under the distribution curve). In reality sample points within R2 will be rejected 90% of the time and we do not attempt to re-sample from R2 to obtain a an accepted sample point, instead we return to step 1 (select a segment to sample from).
If instead we re-attempted sampling from R2 until a valid point was found then R2 would indeed become over-represented, hence we do not do this and the algorithm therefore does not exhibit any such bias.
The Gaussian pdf contains terms for distribution mean and standard deviation. We can remove all excess terms and denormalise the function to obtain a simpler equation with the same shape. This simplified equation is no longer a pdf as the total area under the curve is no loner 1.0 (a key property of pdfs), however as it has the same overall shape it remains suitable for sampling from a Gaussian using rejection methods such as the Ziggurat algorithm (it's the shape of the curve that matters, not the absolute area under the curve).
Simplified and denormalised version of the Gaussian pdf with mean = 0 and s.d. = 1.
|Cores||Box Muller (Million samples/sec)||Ziggurat (Million samples/sec)||Speedup|
The Ziggurat algorithm gives good performance by using a very simple rejection sampling execution path for the majority of sample points generated, but with more expensive calculations performed to maintain mathematical exactness in some specific corner cases represented by the distribution tail and the far edges of the Ziggurat's rectangles.
A faster alternative is to use an approximation of the Gaussian pdf, specifically a piecewise linear representation which amounts to replacing the stacked rectangles with stacked trapezoids (sloped right hand edge to more closely match the slope of the pdf).
For many uses the loss of exactness in exchange for speed will be a good choice. For more details on this approach see Hardware-Optimized Ziggurat Algorithm for High-Speed Gaussian Random Number Generators, Hassan M. Edrees, Brian Cheung, McCullen Sandora, David Nummey, Deian Stefan
Colin, September 2011