Maximum Entropy: Sums and Differences of Random Numbers

Here's a problem that cropped up in the course of some calculations I have been working on recently. In stating the problem, I'll include the physical context, thought this context isn't important for the rest of the discussion, which applies generally to certain manipulations on probability distributions:

A high-energy massive particle (e.g. a proton or α-particle), whose initial kinetic energy is governed by a certain probability distribution, passes through a slab of material. As it travels through the material, it scatters some of the electrons inside the slab, dissipating a small fraction of its energy. The amount of its energy that it loses also has some probability distribution (the so-called 'straggling function'). What is the probability distribution over the energy of the particle as it exits the slab of material?

I'm sure all of us wrestle with exactly this question several times each day. The problem concerns the probability distribution over the difference between two independent random variables¹ (in this case the particle's initial energy and the energy it deposits in the slab of material).

The route to solving this problem involves utilizing the solution to a related problem, namely the probability distribution over the sum of two random variables, so first let's look at that.

The sum of two independent random numbers

In the previous post, I stated that the distribution over this sum is the convolution of the two distributions from which the two random variables are sampled. First, I'll give my explanation of why this is so. Later, to mitigate the risk that my reasoning is wrong, I'll show the results of some computational experiments that test this reasoning.

Suppose we have a parameter, x, and two probability distributions, which we model using continuous functions f(x) and g(x), from which the numbers x₁ and x₂ are respectively sampled. Let's call the sum of these two x_sum.

For any given value of x_sum, there are many possibilities for how we got it. We could have a small value for x₁ added to a large value for x₂. Or, for example, we could add any small number, δ, to x₁, and then subtract the same δ from x₂, to get the same outcome for x_sum.

For a given value of x_sum and a given value of x₁, x₂ must be given by (x_sum- x₁), and the probability for that event is given by the product rule, simplified by our assumption of independence²:

But, as we noted, there are many combinations of x₁ and x₂ that could produce any given x_sum. Thus, the probability for any particular x_sum will be given by the extended sum rule (noting the exclusivity of the different events). We just add up all the probabilities for all the contributing possibilities:

(1)

The integral given in Equation 1 is the definition of the convolution of the two functions f(x) and g(x), denoted f(x)*g(x).

Quality assurance

In principle, this argument should be enough, but since computers are now powerful enough that running a decent Monte-Carlo simulation costs very little, it is reasonable to test our reasoning using this technique.

The easiest way to perform this Monte-Carlo calculation is to convert the PDFs f(x) and g(x) to CDFs, then use a numerical random number generator, uniform between 0 and 1, to sample x₁ and x₂. At each iteration, we get two random numbers, one for each of f(x) and g(x), then we simply find the value of x at which the appropriate CDF matches the random number. We then build a histogram of the generated sums.

The results of this procedure are plotted below for arbitrary functions, f(x) and g(x), together with the result of direct calculation of the integral in Equation 1 (see appendix below for my simple numerical implementation of this).

Evaluation of Equation 1 (solid red curve, marked 'Direct integration') achieves consistency, firstly with common-sense expectation: the result, shown with the solid red curve, has its maximum at x = 9,000, i.e. the sum of the locations where f(x) and g(x) are both maximized (7,000 and 2,000, respectively), and has the expected shape, having the same long right-hand tail as g(x), and the same rounded peak as f(x).

Secondly, the result matches perfectly the Monte-Carlo prediction (source code also shown in the appendix). Here I sampled 10,000 sums, x₁ + x₂, and rescaled the result to the height of the normalized PDF.

Logical analysis and simple experimentation agree. Our result can be considered robust! (And a couple of centuries worth of mathematicians, who already knew the thing to be correct, can continue to rest in peace.)

Then why does the analytical result matter?

In some special cases, analytical formulas can be derived from Equation 1. Going beyond those special cases, though, we usually need to think about computational speed. The Monte-Carlo technique is certainly cheap now compared to 50 or 100 years ago, when scientists and engineers had to spend much more of their time finding analytical or approximate analytical solutions to all sorts of problems, but thanks to a special property of the convolution operation, there is an even quicker way to numerically calculate convolutions.

This property, termed the convolution theorem, is that f(x)*g(x) is the same as inverse Fourier transform of the product of the Fourier transforms of f(x) and g(x). Thanks to Fast Fourier Transform (FFT) algorithms, numerical methods for convolution can be executed in very little time.

The convolve() function in the numpy package for Python, for example, uses the FFT technique. When I repeated the calculation in the above figure, using numpy.convolve() instead of my (admittedly crude) integration procedure, I got pretty much exactly the same result, but it executed about 1,000 times faster. (Admittedly, not all of this performance improvement is due to the logic of the improved algorithm, but rather to its implementation: numpy calls code that is written in C, C++, and Fortran, and can therefore execute the same logic much faster than it could in native Python.)

The difference between two independent random numbers

Now we just need to find a way to adapt our result to problems of the form stated at the beginning of this article. Modifying our earlier reasoning to consider x_diff = x₁ - x₂, we note that for a given x₂ and x_diff, it must be that x₁ = x_diff + x₂. Consequently, we find that our required probability distribution is

(2)

The only real difference³ between Equations 1 and 2 is that the minus sign in Equation 1 has become a plus sign in Equation 2.

Now Equation 2 is technically sufficient to solve the problem, but as we have seen, direct integrations of the like of Equations 1 and 2 can be pretty slow. Since Equation 2 is so tantalisingly similar to Equation 1, therefore, it would be great if there was some mathematical trick to make Equation 2 look more like Equation 1. This would allow us to calculate P(x_diff) using our super-speedy FFT-based numerical procedure. It turns out there is such a trick.

Instead of subtracting a random number drawn from g(x), what we can do is add a number drawn from g(-x), which is g(x) reflected through the y-axis. This is the function we'd like to convolve with f(x).

Unfortunately, convolution algorithms, such as numpy.covolve(), don't know anything about the x-axis - they only receive a set of y-values, so as far as the algorithm is concerned, x = 0 corresponds to the number at the beginning of the list, ans so on. There is no direct way for us to tell numpy.convolve() that these y-values correspond to negative x-values.

All is not lost, however. If we simply reverse the list of numbers specifying the values of g(x), and convolve with that, it will be like we are calculating the distribution over the sum of numbers drawn from f(x) and g(-x), except that the x₂ we'll be adding will be too large, by exactly the length of the array defining g(x). Thus to recover the correct position of the convolved distribution, we must chop off from the beginning of the list of numbers specifying the result, a section of length equal to the length of the supplied g(x). My code to execute this is shown in the appendix.

Testing the procedure is now super easy. We already have the Monte-Carlo procedure that we used above, and we know that it works. Converting it to our current purposes requires only changing one '+' to a '-'. The verification looks like this:

The overlap between the random experiment and the nifty FFT procedure is good enough to conclude that my reasoning was again sound. For your interest, my Monte-Carlo took almost 4 times longer to run, but gives a spectrum with only 50 bins, to cover a range covered by about 4,000 bins with the FFT method. Thus, I'd need to run the Monte-Carlo about 80 times longer again, to achieve the same resolution, while not compromising on the signal-to-noise ratio.

Notes

[1]	In fact, this is an approximation. The straggling function is really a function of the particle's initial energy, and our two random variables are not strictly independent. If the distribution over the initial energy is narrow enough, however, it'll be safe to assume that any changes to the straggling function over the range of energies will be too small to significantly affect the outcome of the calculation.
[2]	Events A and B are conditionally independence if and only if P(A\|B,I) = P(A\|I) and P(B\|A,I) = P(B\|I).
[3]	The convolution operation has the property of commutativity, such that f(x)g(x)* is identical to g(x)f(x)*.

Appendix: Python source code

Here is the function I used for to calculate the distribution over the sum of 2 numbers, by direct calculation of the integral in Equation 1 (this method was about 1,000 times slower than the numpy.convolve() function, which uses FFT):

def sum_by_integration(f, g):

# f and g are numpy arrays giving the magnitude of the functions at each location.
# Note that f and g are defined on the same grid

    n = len(f) + len(g)
    s = numpy.zeros(n)
    steps = len(f)

    counter = 0
    while counter < (n - 1):
        counter += 1
        for step in range(steps):
            if ((ctr - step) >= 0) and ((ctr - step) < len(g)):
                s[ctr] += f[step] * g[ctr - step]

    return s

Here's my function to calculate the CDF, for the Monte-Carlo method. Numpy has its own function to do this, numpy.cumsum(), but I'll give you this recipe, which can be translated into other languages:

def CDF(pdf):

    # pdf is an array giving the magnitude of the probability distribution function at each location.

    cdf = numpy.zeros(numpy.shape(pdf))
    cdf[0] = pdf[0]

    counter = 0
    for p in pdf[1:]:
        counter += 1
        cdf[counter] = cdf[counter - 1] + p

cdf = cdf / float(cdf[-1])                            # (cdf always goes to unity)

    return cdf

And here's the Monte-Carlo implementation, itself:

def sum_by_MC(f, g, N=10000):

    cdf1 = CDF(f)
    cdf2 = CDF(g)

    sums = []

    counter = 0
    while counter < N:
        counter += 1
        r1, r2 = numpy.random.uniform(0, 1, 2)

        # use nonzero() to find index where CDF first exceeds random number
        x1 = numpy.nonzero(cdf1 >= r1)[0][0]
        x2 = numpy.nonzero(cdf2 >= r2)[0][0]
        sums.append(x1 + x2)

    hist, edges = numpy.histogram(sums, bins=50)

    return hist, edges

My code to calculate the distribution of the difference between two random variables is given next:

def difference_by_FFT(f, g):

# calculate distribution over f - g

    g = g[::-1]                                                         # reverse g(x) to look like a shifted g(-x)
    difference = numpy.convolve(f, g, mode='full')    # numpy's fast convolution algorithm
    difference = difference[len(g):]                           # chop off the beginning section
    difference = difference / numpy.sum(difference) # normalize

    return difference

Maximum Entropy

Friday, August 18, 2017

Sums and Differences of Random Numbers

No comments:

Post a Comment