I’ve seen the stick-breaking construction for Dirichlet processes presented several times and read about it in a few places, but what really helped me understand what goes on was visually drawing out the algorithm.

A Dirichlet process D(Fα) centered around distribution F with concentration parameter α is a distribution of probability distributions with expected value F. The concentration parameter α controls how discretized the realizations of the processes are. In the limit α → 0, each of the distributions are concentrated at a single point (but not the same point in every draw). When α → ∞, the drawn distributions are continuous and look like F.

The stick-breaking process expresses the pdf of the realized distribution as $f(x) = \sum_{k=1}^{\infty} \beta_k \delta_{x_k}(x)$, where the $\delta_{x_k}$ are Dirac delta-distributions centered at xk. The xk are independently identically distributed, and the βk are determined by a “stick-breaking” algorithm: $\beta_k = \beta_k^\prime \prod_{i=1}^{k-1} (1-\beta_i^\prime)$ where the βk’ are iid Beta(1, α) distributed.

For better or worse, I often picture distributions as probability density functions (pdfs) or histograms, but here it’s especially useful to think of them as cumulative distribution functions (cdfs).

Suppose F is the standard normal distribution, which has the cdf below.

The normal cdf

If we draw a sample of size n=5 from this distribution and construct the empirical cdf, we get something like the plot below.

The empirical cdf from n=5 observations from the standard normal distribution

Strictly speaking, a Dirichlet process is an infinite process, but for practical purposes the realizations are truncated when the βk become sufficiently small. For the purposes of illustration I’ll simplify even further and stop at k=5. The stick-breaking algorithm can be thought of as starting with a stick of length 1, and breaking it at a proportion β1’ along its length, and repeating with the remainder of the stick. A visualization is below.

Stick-breaking illustration

To use these βs to construct the realized cdf, we note that the empirical cdf has a jump of 0.2 at each observation, and simply replace 0.2 with βk for each observation k:

The realized cdf of the Dirichlet process