Recently, I unveiled a collection of mathematical material, intended partly as an easy entry point for people interested in learning probability from scratch. One thing that struck me, though, as conspicuously missing was a set of simple examples of basic forward problems. To correct this, the following is a brief tutorial, illustrating application of some of the most basic elements of probability theory.
Virtually anybody who has studied probability at school has studied forward problems. Many, though, will have never learned about inverse problems, and so the term 'forward problem' is often never even introduced. What are referred to as forward problems are what many identify as the entirety of probability theory, but in reality, this is only half of it. 
Forward problems start from specification of the mechanics of some experiment, and work from there to calculate the probabilities with which the experiment will have certain outcomes. A simple example is: a deck of cards has 52 cards, including exactly 4 aces, what is the probability to obtain an ace on a single draw from the deck? The name for the discipline covering these kinds of problems is sampling theory.
Inverse problems, on the other hand, turn this situation around: the outcome of the experiment is known, and the task is to tease out the likely mechanics that lead to that outcome. Inverse problems are the primary occupation of scientists. They are of great interest to everybody else, as well, whether we realize it or not, as all understanding about the real world results from analysing inverse problems. Solution of inverse problems, though, is tackled using Bayes' theorem, and therefore also requires the implementation of sampling theory.
To answer the example above (drawing an ace from a full deck), we need only one of the basic laws: the Bernoulli urn rule. Due to the lack of any information to the contrary, symmetry requires that each card is equally probable to be drawn on a single draw, and you can quickly verify that the answer is P(Ace) = 1/13. Here is another example, requiring mildly greater thought: 
The Birthday Paradox
Not really a paradox, but supposedly many find the result surprising. The problem is the following:
What is the minimum number of people needed to be gathered together to ensure that the probability that at least two of them share a birthday exceeds one half?
It doesn't look immediately like my definition of a forward problem, but actually it is several - to get the solution, we'll calculate the desired probability for all possible numbers of people, until we pass the 50% requirement. A 'birthday' is a date consisting of a day and a month, like January 1st - the year is unimportant. Assume also that each year has 365 days. To maximize our exposure to as many basic rules as possible, I'll demonstrate 2 equivalent solutions.
Solution 1
Having at least 1 shared birthday in a group of people can be fulfilled in several possible ways. There may be exactly 2 matching birthdays, or exactly 3 matching, or exactly 4, and so on. We don't want to calculate all these probabilities. Instead, we can invoke the sum rule to observe that
| 
 | (1) | 
To get P(none shared), we'll use the Bernoulli urn rule again. If the number of people in the group is n, then we want the number of ways to have n different dates, divided by the number of ways to have any n dates. To have n different dates, we must select n objects from a pool of 365, where each object can be selected once at most. Thus we need the number of permutations, 365Pn, which, from the linked glossary entry, is
| (2) | 
That's our numerator, the denominator is the total number of possible lists of n birthdays. Well, there are 365 ways to chose the first, and 365 ways to chose the second, thus 365 × 365 ways to chose the first two. Similarly 365 × 365 × 365 ways to chose the first three, thus 365n ways to chose n dates that may or may not be different. Taking the ratio, then
    
    
    
    
And now we only need to crunch the numbers for different values for n, to find out where P first exceeds 50%. To calculate this, I've written a short python module, which is in the appendix, below (a spreadsheet program can also do this kind of calculation). Python is open-source, so in principle anybody with a computer and an internet connection (e.g. You) can use it. The graph below shows the output for n equal to 1 to 60:
The answer, contrary to Douglas Adams, is 23.
Solution 2
The second method is equivalent to the first, it still employs that trick in equation (1) with the sum rule, and exactly the same formula, equation (4), is derived, but this time we'll use the sum rule, the product rule, and the logical independence of different people's birthdays.
To visualize this solution more easily, lets imagine people entering a room, one by one, and checking each time whether or not there is a shared birthday. When the first person enters the room, the probability there is no shared birthday in the room is 1. When the second person enters the room, the probability that he shares his birthday with the person already there is 1/365,so the probability that he does not share the other person's birthday, from the sum rule, is (1 - 1/365).
The probability for no shared birthdays, with n = 2, is given by the product rule:
    
    
We can simplify this, though. Logical independence means that if we know proposition A to be true, the probability associated with proposition B is the same as if we know nothing about A: P(B | I) = P(B | AI). Thus the probability that person B's birthday is on a particular date is not changed by any assumptions about person A's birthday.
So the product rule reduces in this case to
Then the probability of no matching birthdays with n = 2 is 1 × (1-1/365). For n = 3, we repeat the same procedure, getting as our answer 1 × (1-1/365) × (1-2/365).
For n people, therefore, we get
which re-arranges to give the same result as before:
    
    
As before, we subtract this from 1 to get the probability to have at least 1 match among the n people present.
| (3) | 
Thus, when we apply the sum rule, the desired expression is
| (4) | 
Solution 2
The second method is equivalent to the first, it still employs that trick in equation (1) with the sum rule, and exactly the same formula, equation (4), is derived, but this time we'll use the sum rule, the product rule, and the logical independence of different people's birthdays.
To visualize this solution more easily, lets imagine people entering a room, one by one, and checking each time whether or not there is a shared birthday. When the first person enters the room, the probability there is no shared birthday in the room is 1. When the second person enters the room, the probability that he shares his birthday with the person already there is 1/365,so the probability that he does not share the other person's birthday, from the sum rule, is (1 - 1/365).
The probability for no shared birthdays, with n = 2, is given by the product rule:
| 
 | (5) | 
We can simplify this, though. Logical independence means that if we know proposition A to be true, the probability associated with proposition B is the same as if we know nothing about A: P(B | I) = P(B | AI). Thus the probability that person B's birthday is on a particular date is not changed by any assumptions about person A's birthday.
So the product rule reduces in this case to
| 
 | (6) | 
For n people, therefore, we get
which re-arranges to give the same result as before:
| (7) | 
As before, we subtract this from 1 to get the probability to have at least 1 match among the n people present.
Appendix
import numpy as np # package for numerical calculations
import matplotlib.pyplot as plt # plotting package
def permut(n, k): # count the number of permutations, nPk
answer = 1
for i in np.arange(n, n-k, -1):
answer *= i
return answer
def share_prob(n): # calculate P(some_shared_bDay | n)
p = []
for i in n:
p.append(1.0 - permut(365, i)/np.power(365.0, i))
p = np.array(p)
return p
def plot_prob(): # plot distribution & find 50% point
x = np.arange(1.0, 61.0, 1.0) # list the integers, n, from 1 to 60 inclusive
y = share_prob(x) # get P(share|n)
x50 = x[y>0.5][0] # calculate x, y coordinates where 50%
y50 = y[y>0.5][0] # threshold first crossed
fig = plt.figure() # create plot
ax = fig.add_subplot(111)
ax.plot(x, y, linewidth=2)
# mark out the solution on the plot:
ax.plot([x50, x50], [0, y50], '-k', linewidth=1)
ax.plot([0, x50], [y50, y50], '-k', linewidth=1)
ax.plot(x50, y50, 'or', markeredgecolor='r', markersize=10)
ax.text(x50*1.1, y50*1.05, 'Smallest n giving P > 0.5:', fontsize=14)
ax.text(x50*1.1, y50*0.95, '(%d, %.4f)' %(x50, y50), fontsize=14)
# add some other information:
ax.set_title('P(some_shared_birthday | n)', fontsize=15)
ax.set_ylabel('Probability', fontsize=14)
ax.set_xlabel('Number of people, n', fontsize=14)
ax.axis([1, 60, 0, 1.05])

