The Brier score is a way of quantifying the accuracy of probabilistic forecasts of binary events, for example the statement that there is an 80% chance that it will rain in a certain area tomorrow. The Brier score for *n* predictions and events is defined as

where *p _{i}* is the predicted probability that an event will occur, and

*y*is 1 if the event occurs and 0 otherwise.

_{i}The Brier score can be viewed as a loss function with the *p _{i}* being the decisions and the

*y*the outcomes, and is in fact a form of squared error loss. While the squared error loss is a natural choice to a statistician, I wondered if there might be other reasonable choices for scoring this sort of prediction, or if a small set of desirable properties of a scoring function yields the Brier score as the unique solution. It turns out that three reasonable requirements on a loss function

_{i}*l*(

*y*;

*x*) are enough to specify the Brier score up to a multiplicative constant:

**Integrity:**if the decision-maker believes that an event has a probability*p*of occurring, then*p*should be the unique prediction that minimizes the risk*r*(*x*;*p*), or expected loss E[*l*(*Y*;*x*)].**Symmetry:***l*(0;*x*) =*l*(1; 1-*x*).**No penalty for perfect predictions:***l*(1; 1) = 0 and*l*(0, 0) = 0 (it turns out one of these implies the other given points 1 and 2).

We start with the integrity requirement. Because the outcome is binary, the risk when the decision-maker believes that an event has probability *p* of occurring is *r*(*x*; *p*) = (1-*p*) *l*(0; *x*) + *p* *l*(1; *x*), we can find all the minima by finding the points *x* at which the first derivative with respect to *x*, *r*’(*x*; *p*) = (1-*p*) *l_x*(0; *x*) + *p* *l_x*(1; *x*), is equal to zero, and the second derivative is non-positive.

By the symmetry requirement, *r*’(*x*; *p*) = (1-*p*) *l _{x}*(0;

*x*) +

*p*

*l*(1;

_{x}*x*) = 0 can be rewritten as

*r*’(

*x*;

*p*) = (1-

*p*)

*l*(1; 1-

_{x}*x*) +

*p*

*l*(1;

_{x}*x*) = 0, and we have (1-

*p*)

*l*(0;

_{x}*x*) = -

*p*

*l*(1;

_{x}*x*). Thus

*l*(1; 1-

_{x}*x*) = -

*c*

*p*and

*l*(1;

_{x}*x*) =

*c*(1-

*p*). Since the second derivative must be non-positive,

*c*must be non-negative. Since the minimizer is to be unique,

*c*cannot be zero. A similar argument yields the requirements that

*l*(0;

_{x}*x*) = -

*c*

*p*and

*l*(0; 1-

_{x}*x*) =

*c*(1-

*p*). Since

*p*is to be the risk minimizer, we have that

*l*(

*y*,

*p*) =

*c*(

*y*-

*p*)^2 +

*k*for all

*p*. The lack of penalty for perfect predictions implies that

*k*= 0.

Thus we the Brier score up to a multiplicative constant. Such a factor doesn’t change any meaningful properties of the loss function, so this isn’t really a problem. Why not set it to 1?

*Note: additivity is a standard property of loss functions, so I didn’t include it as a separate requirement for this particular case.*