###### Article

# Blinded by Random Variables

It is seldom possible to make two separate measurements of the same physical system that give exactly identical results. Any measurement can be considered a random variable (RV), which has a mean value and a standard deviation, or error. The square of the error is called the variance. In physics and biology measurements are usually combined in interesting ways to gain more insight into Nature. When combining measurements an interesting question arises: given that a measurement is manipulated in some way, what is done with the corresponding error? Put another way, if an operation is performed on a measurement, what is the variance in the result? This is the propagation of errors (POE) problem.

Most students of science are introduced to some form of a POE estimate in the first year of university. I was introduced to two versions of POE estimate, one in physics and one in chemistry. Curiously they were not the same. The chemistry POE estimate was simple; whatever is done to the measurements is also done to the errors. Naturally the physics POE estimate was more complex and is summarized in the equation,

If z is a function of several measurements, then the variance (the standard deviation squared) in z is the sum of the square of the product of the partial derivative of the function with respect to the measurement and the standard deviation (or error) of that measurement. For example, if z = x + y (and x and y are “independent”) (1) then σ_{z}^{2} = σ_{x}^{2} + σ_{y}^{2}. The physics POE estimate is a better estimate than the chemistry POE estimate. However, even the physics POE estimate is only an estimate, and should not be applied blindly.

The POE estimate formula is derived using Taylor’s expansion and is based on perturbation theory, which relies on the perturbation being small. The perturbation in this case is the standard deviation. If the perturbation is large then the neglected higher order terms of the Taylor expansion will contribute more to the error than the first order term. The derivation of the POE estimate formula can be found in An introduction to error analysis by John R. Taylor (1). For those readers who wish to see the POE fail try the following: generate 100 Poisson distributed measurements with a mean value of 9. These measurements will have a standard deviation of ~ 3. Take the exponent, z = e^{x}, of the measurements and calculate the new error. Now determine the estimate from the POE formula. The POE gives σ_{z} = 3e^{9}, whereas the true propagation of error is e^{9(e – 1)}(e^{9e(e – 2)} – 1)^{1/2}.

**Definitions.** At this point some important definitions should be stated. These definitions should help the reader understand what follows, which is an attempt at a better POE formula.

Def: Two or more RV’s are independent if the outcome of one RV has no influence on the outcome(s) of the other(s).

Def: A probability density is a function of a RV that describes the likelihood that the argument will be the result of a measurement. See Figure 1 for examples of probability densities.

Def:A characteristic function is the Fourier transform of a probability density. The characteristic function has several properties: Φ_{x}(k)|_{k=0} = 1, (all densities are normalized), (^{-1}⁄_{2πi})^{dΦx}⁄_{dk}|_{k=0} = x, (the derivative evaluated at zero is the mean), and (^{-1}⁄_{4π2})^{d2Φx}⁄_{dk2}|_{k=0} = x^{2}, (the second derivative evaluated at zero is the mean value of the square).

**Conditional statistics.** The probability density of several RV’s is called the joint probability density. The conditional probability density ρ_{z|x} is defined as the probability density of z when x is given (or when x has already occurred). There is a relationship between the joint density and the conditional density (2),

How can Eq. (2) be used to derive a POE estimate? Let us examine the addition of two IRVs, i.e. z = x + y.

First the joint density of z and x must be calculated. This calculation can be simplified using Eq. (2) to the calculation of the conditional density of z given x. How is the conditional density calculated? Since x is given we may write y = z – x, and ask, what is the probability that y takes on the value of z – x? The answer is the probability density of y evaluated at y = z – x, ρ_{z|x} = ρ_{y}(z-x). The next step is to determine the probability density of z alone. To do this we integrate the joint density of z and x over all possible values (outcomes) of x,

This equation is the familiar *convolution* of the densities of x and y. Now the powerful convolution theorem can be applied to show that the characteristic function of z is the product of the characteristic functions of x and y, Φ_{z} = Φ_{x}Φ_{y}. The special properties of the characteristic function can be used to calculate the mean and variance of z,

Eq. (4) reduces to z = x + y since the densities are normalized. The variance of z, σ_{z}^{2} is the difference of z^{2} and z^{2}. The value of z^{2} can be calculated using the special properties of characteristic functions, z^{2} = ^{d2Φz}⁄_{dk2}|_{k=0},

which reduces to z^{2} = x^{2} + 2xy + y^{2}. The variance in z can be calculated by subtracting the square of the mean value of z. After some algebra the result is σ_{z}^{2} = σ_{x}^{2} + σ_{y}^{2}, in agreement with the physics POE estimate.

For the addition of IRVs the POE formula derived from Taylor’s expansion and the POE formula derived from conditional statistics agree exactly. What about the product of IRVs? Taylor’s POE formula is only approximate (if z = xy then σ_{z}^{2} = x^{2}σ_{x}^{2} + y^{2}σ_{y}^{2}) because it relies on perturbation theory and an assumption that the mean value of a function of a random variable is equal to the function evaluated at the mean value, ƒ(x) = ƒ(x).

Let us determine the POE for the product of two IRVs, z = xy using the conditional approach. Again the problem reduces to the calculation of the conditional density through Eq. (2). As for the addition of IRVs we may write y in terms of z and x, y = z/x, since x is given. However, the conditional density cannot be simply written as the density of y evaluated at z/x. The integral of the density before and after the substitution y = z/x must be identical. To ensure this the density must be multiplied by the absolute value of the derivative of y with respect to z, ρ_{z|x} = ρ_{y}(^{z}⁄_{x})|^{1}⁄_{x}|. The probability density of z alone is again calculated by integrating the join density over all values of x,

Unfortunately, this is not a convolution. However, the Fourier transform and the properties of the characteristic function are still useful. Taking the Fourier transform of Eq. (6) and letting y = z/x as in the stretch theorem, (all Fourier analysis theorems such as the stretch theorem can be found in *The Fourier transform and its applications* (3)), we get:

The properties of characteristic functions can be applied to determine the mean and variance of z. After some further algebra it can be shown that the mean of z is the product of the mean of x and y, and the variance of z is x^{2}y^{2} – x^{2}y^{2}, which can be expanded to σ_{x}^{2}σ_{y}^{2} + x^{2}σ_{y}^{2} + y^{2}σ_{x}^{2}. The conditional formula has an additional term, the product of the variances, σ_{x}^{2}σ_{y}^{2}. This shows that the Taylor approximation is reasonable provided the errors are less than the means.

For the equation z = x^{2} the Taylor POE estimate, σ_{z}^{2} = 4x^{2}σ_{x}^{2}, is very misleading. The Taylor expansion is based on expressing any function as a sum of polynomials. Hence, the Taylor expansion of a polynomial is the polynomial itself, and *no term can be neglected*.

The conditional POE can be used on z = x^{2}, but some caution is in order. The formula ρ_{z} = ρ_{x}(z^{1/2})^{1}⁄_{2}z^{-1/2} may not be correct if x can take on both positive and negative values. The problem is that x^{2} has no well defined inverse. One inverse, the principle branch, is x = z^{1/2}. The other inverse, the secondary branch, is x = -z^{1/2}.

For example, suppose ρ_{x} = Π(x), which is equal to 1 if the argument is between -½ and ½ and zero otherwise. Now, z may become ¼ if x becomes either ½ or -½. Hence, the density of z must address the doubling up of probabilities, in this case ρ_{z} = ρ_{x}(z^{1/2})z^{-1/2}. The situation is more complicated when ρ_{x} = Π(x-a), a < ½, since some of the positive values of x do not have a negative counterpart with which to “double up”. The probability density Π(x) is also interesting because it shows how the Taylor POE formula fails in the case of z = x^{2}. For ρ_{x} = Π(x) the mean value is zero, and the variance is 1/12. The Taylor POE estimate for the error in z would be zero, σ_{z}^{2} = 4x^{2}σ_{x}^{2} = 0, whereas the actual variance in z is 1/180. (Also note that the mean value of z is 1/12, and not equal to x^{2}, which is zero.)

**Correlated random variables.** Random variables are correlated if the outcome of one RV influences the outcome of another. For example, z and x are correlated in the case where z = x + y. An operational definition of correlation is: if the mean value of the product differs from the product of the respective mean values then the RVs are correlated, i.e.

Suppose that y consists of an IRV, ξ say, a fraction of another IRV, x, such that y = ξ + fx, and assume that x and ξ have the same probability density. The mean value of the product of x and y is not equal to the product of the mean values, but includes an extra term,

When taking the mean value from a set of measurements the usual assumption is that all the measurements are independent. Whether or not the measurements are independent will have no effect on the estimate of the mean value. However, the degree of dependence of the measurements will have an impact on the estimate of the standard error. The standard error is the error in the estimate of the mean. This error will be either over or under estimated when the correlations in the measurements are ignored.

Suppose we have a set of measurements and we calculate a new RV as the sum of all the measurements, S = Σ_{i=1,…,N} *x _{i}*. The expectation (or mean value) of the sum is the sum of the expectation, i.e. 〈S〉 = Σ

_{i=1,…,N}〈

*x*〉, and if all the expectations are equal (if all the measurements were taken from the same probability density) then we may write 〈S〉 = N〈x〉. The variance of S can be calculated as: σ

_{i}^{2}

_{S}= Nσ

^{2}

_{x}+ 2Σ

_{i=1,…,N-1}Σ

_{j=i+1,…,N}cov(x

_{i}, x

_{j}), if the x

_{i}are independent then σ

^{2}

_{S}= Nσ

^{2}

_{x}. When we divide 〈S〉 by N to get the estimate of the mean of x, we divide σ

^{2}

_{S}by N

^{2}to get the estimate of the variance in the mean, σ

^{2}

_{x}=

^{Nσ2x}⁄

_{N2}=

^{σ2x}⁄

_{N}and the standard error is the standard deviation divided by the square root of the number of measurements. If, however, the measurements are correlated then the standard error becomes: σ

^{2}

_{x}=

^{σ2x}⁄

_{N}+

^{2}⁄

_{N2}Σ

_{i=1,…,N-1}Σ

_{j=i+1,…,N}cov(x

_{i}, x

_{j}). The covariance of x and y is defined as the difference between the mean value of the product of x and y, and the product of the mean values of x and y, cov(x, y) = 〈xy〉 – 〈x〉〈y〉.

If y = x, then cov(x, x) is just the auto-covariance that is just the variance. For IRVs cov(x, y) = 0, and for y = ξ + fx the covariance is not zero, cov(x, y) = fσ^{2}_{x}. Another definition related to covariance is the correlation, which is the expectation of the product. If we have no information concerning the independence of RVs the variance of the sum is the sum of the variances, plus twice the covariance, σ^{2}_{x+y} = σ^{2}_{x} + 2cov(x, y) + σ^{2}_{y}. Hence, if y = x then σ_{2x2} = 4σ_{x2}, in agreement with the Taylor and the conditional POE estimate showing that for scalar multiplication, z = 2x the POE estimate is correct.

Perhaps in some cases the reason a model does not agree with the data is because the errors do not include the covariance.

**Densities that move.** In the example above, it was assumed that all the samples were sampled from the same density. Unfortunately this assumption can be violated. For example, suppose you are sitting in a restaurant. To pass the time, you begin to count the number of customers who enter the restaurant in five minute intervals. The sampled data from this type of measurement follows the well-known Poisson distribution. However, when you analyze your results you find that the variance differs from the mean. What happened? On close inspection you realize that your experiment began just after breakfast and concluded after lunch. At lunchtime you should expect the density to change, as more people tend to go into a restaurant at that time. When you reanalyze the data and separate the pre-lunch from the lunchtime measurements you find, to your satisfaction, that the separate results are indeed Poisson, but with different mean values (Figure 2).

**Cascades.** Among all the things one can do with RV’s the most interesting one is the cascade. A cascade is a sequence of random events where the next event in the sequence depends on the result of the current random event. The simplest cascade consists of two random variables. The first random variable determines the number of samples taken from the density of the second and these samples are summed. For example, suppose you had seven regular (fair) dice. One of them is green, and the rest are red. Each “measurement” consists of first throwing the green die, say the result is a 3, then roll 3 red dice and add the results. Each measurement is a sum of a random number of red dice. The mean value and the variance of this cascade can be calculated! Furthermore, the conditional formula introduced earlier, Eq. (2), can be used to determine the density. As an exercise for the reader try to prove that the mean of the example given above is 12.25, and the variance is 735/16.

Among the subjects of mathematics, in my opinion, probability theory ranks with Logic in terms of difficulty. It is a good thing probability theory is difficult otherwise there would be no profit in casinos and insurance companies. If probability theory were easy then everyone would know they can’t win at gambling, and they would see that insurance companies charge far more then the expected costs of care and replacement. How often does a telephone cable break? Once a month? Once a year? Once a decade? Is it more profitable to take my chances or pay the telephone company insurance against cable breakage? Rest assured that the insurance company has already worked that out. Do you think they want to break even or make a tidy (or hefty) profit? The real question of whether to buy insurance is to ask “what happens to the money?” Every day the telephone cable does not break would you like to pay the insurance, or invest the money as you see fit? If the cable breaks then pay to get it fixed. If you cannot afford to get it fixed then get a loan. If you cannot get a loan then go without a cable phone and get a cell phone or cancel your phone line and send mail.

**Conclusion.** The Taylor POE estimate is useful for a restricted subset of all propagation of error problems that will be encountered. The conditional POE estimate is more accurate, but it quickly becomes very difficult to use when the function through which the RVs are passed is non-linear. In this situation the alternative is to perform the function on the measurements themselves, rather than on the mean values. This way the probability density can be estimated, and the new values (mean and variance) measured from the new random variables, f(x). Unfortunately, we often do not have all the measurements, but instead are only given the mean and variance. An educated guess about the probability density is necessary to use the conditional POE estimate in this case. Otherwise the Taylor POE estimate is the only choice. Caution is always advised, and not blind application of any old POE estimate. Do not be blinded by random variables.

**References**

1. John R. Taylor, An introduction to error analysis, 2nd ed. (University Science Books, Sausalito, California, 1997)

2. J. Laurie Snell, Introduction to probability, 1st ed. (The Random House, New York, 1988)

3. Ronald N. Bracewell, The Fourier transform and its applications, 3rd ed. (McGraw-Hill, New York, 2000)