Closures in R


In my earlier post Building a Bayesian counterpart to R’s lm() function, I wrote a blm() function that took arguments similar to those passed to the usual lm() function and returned information about the posterior distribution of linear model parameters. The marginal posterior of the regression coefficients in that model is a multivariate t distribution with a non-centrality vector, dispersion matrix, and degrees of freedom as parameters. In order to sample from this distribution given a fit object, we could load the mvtnorm package and pass the parameters to the rmvt() function as follows:

sample <- rmvt(n=10,
               sigma=fit$posterior.params$t.cov,
               df=fit$posterior.params$t.df,
               delta=fit$posterior.params$t.mean)

This isn’t so bad, but it could be a lot easier if we could sample it by calling the following:

sample <- fit$rmarginal(n=10)

Simpler, right? The fit object would include a function rmarginal() which samples from the t distribution with the correct parameters. In order to do this we use an R concept called a closure, which is a function packaged along with an environment, basically a list of all the variables in the scope in which the function was declared, along with their values. To create the rmarginal() function, we find the code near the end of the fit.blm() function:

posterior.params <- list(mean=mean,
                           precision=precision,
                           df=df,
                           scale=scale,
                           a=a,
                           b=b,
                           t.mean=t.mean,
                           t.cov=t.cov,
                           t.df=t.df)

  list(coefficients=coefficients,
       residuals=residuals,
       fitted.values=fitted.values,
       rank=rank,
       df.residual=df,
       posterior=posterior.params)

and change it to the following:

posterior.params <- list(mean=mean,
                           precision=precision,
                           df=df,
                           scale=scale,
                           a=a,
                           b=b,
                           t.mean=t.mean,
                           t.cov=t.cov,
                           t.df=t.df)

  rmarginal <- function(n) {
      library(mvtnorm)
      rmvt(n,
           sigma=posterior.params$t.cov,
           df=posterior.params$t.df,
           delta=posterior.params$t.mean)
  }

  list(coefficients=coefficients,
       residuals=residuals,
       fitted.values=fitted.values,
       rank=rank,
       df.residual=df,
       posterior=posterior.params,
       rmarginal=rmarginal)

When we declare the rmarginal() function we reference posterior.params$t.cov, along with the other distribution parameters. Since rmarginal() relies on these variables which are defined in the scope of the fit.blm() function, it carries them along with itself. This combination of the function and the data attached to it is the closure. In order to draw a sample of size ten from the marginal posterior we can then simply call fit$rmarginal(10) and get the appropriate result!