An R snippet for eliciting priors via quantiles


It’s fine to use a vague prior for a Bayesian analysis when we really don’t have any idea about what a parameter is (or if we want to pretend we don’t), but when there is prior knowledge available, it’s often a waste to not draw from it. This prior knowledge might come from a non-statistician scientist, and so it’s important to have a way for them to communicate their knowledge to us in a way that allows us to construct a prior. However, unless the prior we want to construct is a Bernoulli or maybe a normal distribution, eliciting distribution parameters directly is unlikely to work.

Perhaps a more reliable and intuitive way is to specify quantiles, for example “we’re 50% sure that the proportion is less than 0.01 and 99% sure that it’s less than 0.05.” Giving two quantiles in this way is enough to determine a distribution from a two-parameter family, and is much easier to understand than saying that our prior knowledge follows a Beta(a, b) distribution.

Unless you’re much better at guessing parameters from quantiles than I am, it’s helpful to have a function that takes in the quantiles and returns the corresponding parameter values. Here’s the function I use for normal, beta, and gamma distributions, which are the three I most commonly use for priors (along with inverse-gamma priors, which can be obtained by inverting the desired quantiles in the gamma case).

params.from.quantiles <- function(q, p, family="normal",
                                  start=c(0.5, 0.5)) {
  # Finds parameter values of distributions that closely match
  # the specified quantiles.
  #
  # Args:
  #   q: A vector of quantiles (length 2, strictly increasing)
  #   p: A vector of percentiles (length 2, strictly increasing)
  #   family: One of "normal", "beta", or "gamma" specifying
  #           which family of distributions to use
  #   start: starting values for parameter search
  #
  # Returns:
  #    A list containing a named vector of parameter values (par),
  #    a named vector containing the fitted quantiles (cdf),
  #    and the family used (family)
  
  # basic input validation
  stopifnot(length(p) == 2,
              length(q) == 2,
              p[1] < p[2],
              q[1] < q[2],
              0 < p[1],
              p[2] < 1)
  
  # select distribution function
  g <- switch(family,
              normal=pnorm,
              beta=pbeta,
              gamma=pgamma)
  
  # a wrapper of the cdf for optim
  f <-function(theta) {
    # if parameter values are invalid, return NA
    tryCatch({
      sum((g(q, theta[1], theta[2]) - p)^2)
    }, warning=function(w) {
      NA
    })
  }
  
  # find best parameter values
  sol <- optim(start, f)
  
  # prepare output
  out <- list()
  out$par <- sol$par
  names(out$par) <- switch(family,
                           normal=c("mean", "sd"),
                           beta=c("a", "b"),
                           gamma=c("shape", "rate"))
  
  out$cdf <- g(q, out$par[1], out$par[2])
  names(out$cdf) <- q
  
  out$family <- family
  
  out
}

An example:

> params.from.quantiles(q=c(0.01, 0.05), p=c(0.5, 0.99), family="beta")
$par
         a          b 
  1.377637 105.506531 

$cdf
     0.01      0.05 
0.4999679 0.9899850 

$family
[1] "beta"

The function is also available as a gist on Github.