Maximum Entropy: July 2013

Sunday, July 28, 2013

Forward Problems

Recently, I unveiled a collection of mathematical material, intended partly as an easy entry point for people interested in learning probability from scratch. One thing that struck me, though, as conspicuously missing was a set of simple examples of basic forward problems. To correct this, the following is a brief tutorial, illustrating application of some of the most basic elements of probability theory.

Virtually anybody who has studied probability at school has studied forward problems. Many, though, will have never learned about inverse problems, and so the term 'forward problem' is often never even introduced. What are referred to as forward problems are what many identify as the entirety of probability theory, but in reality, this is only half of it.

Forward problems start from specification of the mechanics of some experiment, and work from there to calculate the probabilities with which the experiment will have certain outcomes. A simple example is: a deck of cards has 52 cards, including exactly 4 aces, what is the probability to obtain an ace on a single draw from the deck? The name for the discipline covering these kinds of problems is sampling theory.

Inverse problems, on the other hand, turn this situation around: the outcome of the experiment is known, and the task is to tease out the likely mechanics that lead to that outcome. Inverse problems are the primary occupation of scientists. They are of great interest to everybody else, as well, whether we realize it or not, as all understanding about the real world results from analysing inverse problems. Solution of inverse problems, though, is tackled using Bayes' theorem, and therefore also requires the implementation of sampling theory.

To answer the example above (drawing an ace from a full deck), we need only one of the basic laws: the Bernoulli urn rule. Due to the lack of any information to the contrary, symmetry requires that each card is equally probable to be drawn on a single draw, and you can quickly verify that the answer is P(Ace) = 1/13. Here is another example, requiring mildly greater thought:

The Birthday Paradox

Not really a paradox, but supposedly many find the result surprising. The problem is the following:

What is the minimum number of people needed to be gathered together to ensure that the probability that at least two of them share a birthday exceeds one half?

It doesn't look immediately like my definition of a forward problem, but actually it is several - to get the solution, we'll calculate the desired probability for all possible numbers of people, until we pass the 50% requirement. A 'birthday' is a date consisting of a day and a month, like January 1st - the year is unimportant. Assume also that each year has 365 days. To maximize our exposure to as many basic rules as possible, I'll demonstrate 2 equivalent solutions.

Solution 1

Having at least 1 shared birthday in a group of people can be fulfilled in several possible ways. There may be exactly 2 matching birthdays, or exactly 3 matching, or exactly 4, and so on. We don't want to calculate all these probabilities. Instead, we can invoke the sum rule to observe that

P(at least one shared) = 1 - P(none shared)

(1)

To get P(none shared), we'll use the Bernoulli urn rule again. If the number of people in the group is n, then we want the number of ways to have n different dates, divided by the number of ways to have any n dates. To have n different dates, we must select n objects from a pool of 365, where each object can be selected once at most. Thus we need the number of permutations, ₃₆₅P_n, which, from the linked glossary entry, is

(2)

That's our numerator, the denominator is the total number of possible lists of n birthdays. Well, there are 365 ways to chose the first, and 365 ways to chose the second, thus 365 × 365 ways to chose the first two. Similarly 365 × 365 × 365 ways to chose the first three, thus 365ⁿ ways to chose n dates that may or may not be different. Taking the ratio, then

(3)

Thus, when we apply the sum rule, the desired expression is

(4)

And now we only need to crunch the numbers for different values for n, to find out where P first exceeds 50%. To calculate this, I've written a short python module, which is in the appendix, below (a spreadsheet program can also do this kind of calculation). Python is open-source, so in principle anybody with a computer and an internet connection (e.g. You) can use it. The graph below shows the output for n equal to 1 to 60:

The answer, contrary to Douglas Adams, is 23.

Solution 2

The second method is equivalent to the first, it still employs that trick in equation (1) with the sum rule, and exactly the same formula, equation (4), is derived, but this time we'll use the sum rule, the product rule, and the logical independence of different people's birthdays.

To visualize this solution more easily, lets imagine people entering a room, one by one, and checking each time whether or not there is a shared birthday. When the first person enters the room, the probability there is no shared birthday in the room is 1. When the second person enters the room, the probability that he shares his birthday with the person already there is 1/365,so the probability that he does not share the other person's birthday, from the sum rule, is (1 - 1/365).

The probability for no shared birthdays, with n = 2, is given by the product rule:

P(AB | I) = P(A | I)P(B | AI)

(5)

We can simplify this, though. Logical independence means that if we know proposition A to be true, the probability associated with proposition B is the same as if we know nothing about A: P(B | I) = P(B | AI). Thus the probability that person B's birthday is on a particular date is not changed by any assumptions about person A's birthday.

So the product rule reduces in this case to

P(AB | I) = P(A | I)P(B | I)

(6)

Then the probability of no matching birthdays with n = 2 is 1 × (1-1/365). For n = 3, we repeat the same procedure, getting as our answer 1 × (1-1/365) × (1-2/365).

For n people, therefore, we get

which re-arranges to give the same result as before:

(7)

As before, we subtract this from 1 to get the probability to have at least 1 match among the n people present.

Appendix

import numpy as np # package for numerical calculations
import matplotlib.pyplot as plt # plotting package

def permut(n, k): # count the number of permutations, nPk
answer = 1
for i in np.arange(n, n-k, -1):
answer *= i
return answer

def share_prob(n): # calculate P(some_shared_bDay | n)
p = []
for i in n:
p.append(1.0 - permut(365, i)/np.power(365.0, i))
p = np.array(p)
return p

def plot_prob(): # plot distribution & find 50% point
x = np.arange(1.0, 61.0, 1.0) # list the integers, n, from 1 to 60 inclusive
y = share_prob(x) # get P(share|n)

x50 = x[y>0.5][0] # calculate x, y coordinates where 50%
y50 = y[y>0.5][0] # threshold first crossed

fig = plt.figure() # create plot
ax = fig.add_subplot(111)
ax.plot(x, y, linewidth=2)

# mark out the solution on the plot:
ax.plot([x50, x50], [0, y50], '-k', linewidth=1)
ax.plot([0, x50], [y50, y50], '-k', linewidth=1)
ax.plot(x50, y50, 'or', markeredgecolor='r', markersize=10)
ax.text(x50*1.1, y50*1.05, 'Smallest n giving P > 0.5:', fontsize=14)
ax.text(x50*1.1, y50*0.95, '(%d, %.4f)' %(x50, y50), fontsize=14)

# add some other information:
ax.set_title('P(some_shared_birthday | n)', fontsize=15)
ax.set_ylabel('Probability', fontsize=14)
ax.set_xlabel('Number of people, n', fontsize=14)
ax.axis([1, 60, 0, 1.05])

Saturday, July 20, 2013

Greatness, By Definition

The goals for this blog have always been two-fold: (1) to bring students and professionals in the sciences into close acquaintance with centrally important topics in Bayesian statistics and rational scientific method, and (2) to bring the universal scope and beauty of science to the awareness of as many as possible, both within and outside the field - if you have any kind of problem in the real world, then science is the tool for you.

Effective communication to scientists, though, runs the risk of being impenetrable for non-scientists, while my efforts to simplify make me feel that the mathematically adept reader will be quickly bored.

Helping to make the material on the blog more accessible, therefore, and as part of a very, very slow but steady plan to achieve world domination, here are two new resources I've put together, which I am very happy to announce:

Glossary - definitions of technical terms used on the blog
Mathematical Resource - a set of links explaining the basics of probability from the beginning. Blog articles and glossary entries are linked in a logical order, starting from zero assumed prior expertise.

Both of these now appear in the links list on the right-hand sidebar.

The new resources are partially complete. Some of the names of entries in the glossary, for example, do not yet correspond to existing entries. Regular updates are planned.

The mathematical resource is a near-instantaneous extension of the material compiled for the glossary, and is actually my glacially slow response to a highly useful suggestion made by Richard Carrier, almost one year ago. The material has been organized in what seems to me to be a logical order, and for those interested, may be viewed as a short course in statistics, delivering, I hope, real practical skills in the topic. Its main purpose, though, is to provide an entry point for those interested in the blog, but unfamiliar with some of the important technical concepts.

The glossary may also be useful to those already familiar with the topics. Terms are used on the blog, for example, with meanings different to those of many other authors. Hypothesis testing is one such case, limited by some to denoting frequentist tests of significance, but used here to refer more generally to any ranking of the reliability of propositions.

The new glossary, then, is an attempt to rationalize the terminology, bringing the vocabulary back in line with what it was always intended to mean, not to reflect some flawed surrogates for those original intentions. For the same reason, in some cases alternate terms are used, such as 'falsifiability principle', in preference to the more common 'falsification principle'.

Important distinctions are also highlighted. Morality, for example is differentiated from moral fact. Philosophers are found to be distinct from 'nominal philosophers'. Equally importantly, science is explicitly distanced from 'the activity of scientists'. As a result, morality, philosophy, and science are found to be different words for exactly the same phenomenon.

In a previous article, I warned against excessive reliance on jargon, so it's perhaps worth explaining how the current initiative is not hypocrisy. That article was concerned with over use of unnecessary jargon, which often serves as an impediment to understanding. Symptoms of this include (1) jargon terms replaced with direct (but less familiar) synonyms result in confusion, and (2) vocabulary replaced with familiar terms with inapplicable meanings goes unnoticed. By providing a precise lexicon, we can help to prevent exactly these problems, and others, thus whetting the edge of our analytical blade, and accelerating our philosophical progress.

On a closely related topic, there is a fashionable notion going along the lines that to argue from the definition of words is fallacious, as if definitions of words are useless. This is not correct: argument from definition is a valid form of reasoning, but one that is very commonly misused.

Eliezer Yudkowsky's highly recommendable sequence, A Human's Guide to Words, covers the fallacious application very well. His principal example is the ancient riddle: if a tree falls in a forest, where nobody is present to hear it, does it make a sound? Yudkowsky imagines two people arguing over this riddle. One asserts, "yes, by definition: acoustic vibrations travel through the air," the other responds, "no, by definition: no auditory sensation occurs in anybody's brain." These two are clearly applying different definitions to the same word. In order to reach consensus, they must agree on a single definition.

These two haven't committed any fallacy yet, each is reasoning correctly from their own definitions. But it is a pointless argument - as pointless as me arguing in English, when I only understand English, with a person who only speaks and understands Japanese. Fallacy begins, however, as in the following example.

Suppose you and I both understand and agree on what the word 'rainbow' refers to. One day, though, I'm writing a dictionary, and under 'rainbow,' I include the innocent looking phrase: "occurs shortly after rain." (Well duh, rain is even in the name.) So we go visit a big waterfall and see colours in the spray, and I say "look, it must have recently rained here." Con artists term this tactic 'bait and switch.' I can not legitimately reason in this way, because I have arbitrarily attached not a symbol to a meaning, but attributes to a real physical object.

To show trivially that there is a valid form of argument from definition, though, consider the following truism: "black things are black." This is necessarily true, because blackness is exactly the property I'm talking about when I invoke the phrase "black things." It is not that I am hoping to alter the contents of reality by asserting the necessary truth of the statement, but that I am referring to a particular class of entities, and the entities I have in mind just happen to all be black - by definition.

One might complain, "but I prefer to use the phrase 'black things' not just for things that are black, but also for things that are nearly black." This would certainly be perverse, but it's not in any sense I can think of illegal. Fine, if you want to use the term that way, you may do so. I'll continue to implement my definition, which I find to be the most reasonable, and every time you hear me say the words "black things," you must replace the words with any symbol you like that conveys the required meaning to you. Your symbol might be a sequence of 47 charcoal 'z's marked on papyrus, or a mental image of a yak, I don't care.

Yes, our definitions are arbitrary, but arbitrary in the sense that there is no prior privileged status of the symbols we end up using, and not in the sense that the meanings we attach to those symbols are unimportant.

Here's an example from my own experience. Several times, I have tried unsuccessfully to explain to people my discovery that ethics is a scientific discipline. (By the way I'm not claiming priority for this discovery.) The objections typically go through 3 phases. First is the feeling of hand-waviness, which is understandable, given how ridiculously simple the argument is:

them-	No way, its too simple. You can't possibly claim such an unexpected result with such a thin argument.
me-	OK, show me which details I've glossed over.
them-	[pause...] All right, the argument looks logically sound, but I don't believe it - look at your axioms: why should I accept those? Those aren't the axioms I choose.
me-	Those aren't axioms at all. I don't need to assume their truth. These are basic statements that are true by definition. If you don't like the words I've attached to those definitions, then pick you own words, I'm happy to accommodate them.
them-	YOU CAN'T DO THAT! You're trying to alter reality by your choice of definitions....

And that's the final stumbling block people seem to have the biggest trouble getting over.

Definitions are important. If you think that making definitions is a bogus attempt to alter reality, then be true to your beliefs: see how much intellectual progress you can make without assigning meanings to words. The new <fanfare!> Maximum Entropy Glossary </fanfare!> is an attempt to streamline intellectual progress. If you engage with anything I have written on this blog, then you engage with meanings I have attached to strings of typed characters. In some important cases, I have tried to make those meanings clear and precise. If you find yourself disagreeing with strings of characters, then you are making a mistake. If you disagree with the way I manipulate meanings then we can discuss it like adults, confident that we are talking about the same things.