At the end of my previous post, I argued that an intuitive grasp of Bayesian model comparison is an invaluable asset to those wishing to apply scientific method. Even if one is never going to execute rigorous calculations of the form described in that article, it is still possible to gain important insight into the plausibility of different descriptions of reality, merely by exercising one’s familiarity with the general structure of the formal theory. Here, I’ll illustrate the application of this kind of informal reasoning, which, while not making use of even a single line of algebra, can still be seen to be quite water tight.

One thing I have
tried to get across in my writing here is that scientific method is for everybody and can
address all meaningful questions concerning fact. If a matter of fact has real
consequences for the state of reality, then it is something that can be
scrutinized by science. If it has no real consequences, then it is at most
charitable, an extremely poor class of fact. In this article, I’ll apply probability theory to the question of whether or not the universe is the
product of some omnipotent deity. It’s a lot simpler to do than you might
think.

Now there are some (including, sadly, some scientists) who maintain that what I am going to
do here is inappropriate and meaningless. To many of these people, reality is
divided into two classes of phenomena: the natural and the supernatural.
Natural phenomena, they say, are the things that fall into the scope of
science, while the supernatural lies outside of science’s grasp, and can not be
addressed by rational investigation. This is completely muddle-headed, as I
have argued elsewhere. If we can measure it, then it falls into
science’s domain. If we can’t measure it, then postulating its existence
achieves nothing.

Other disguised
forms of this argument exist. I was once asked by another physicist (and good
friend): ‘How can science be so arrogant to think that it can address all
aspects of reality?’ To which the answer is obvious: if you wish to claim that
there is something real that can not be investigated scientifically, how can

*you*be so arrogant to think that you know what it is? What could possibly be the basis for this knowledge?
As I said, addressing the existence of God with probability theory is quite simple to
achieve. In fact, it is something that one of my mathematical heroes,
Pierre-Simon Laplace, achieved with a single sentence, in a conversation with
Napoleon I. The conversation occurred when the emperor was congratulating the
scientist on his new book on celestial mechanics, and proceeded as follows:

Napoleon:

*You made the system of the world, you explain the laws of all creation, but in all your book you speak not once of the existence of God!*

Laplace:

*Sire, I had no need of that hypothesis.*

Lagrange
(another mathematician who was also present):

*Ah, but that is such a good hypothesis. It explains so many things!*

Laplace:

*Indeed, Sire, Monsieur Lagrange has, with his usual sagacity, put his finger on the precise difficulty with the hypothesis: it explains everything, but predicts nothing.*

I believe this might be the world’s earliest recorded application of Bayesian model
comparison.

By the way, a
quick note of thanks: when I first came across the full version of this
exchange, I struggled to find strong reason to treat it as more than a legend,
but historian and Bayesian, Richard Carrier, has
pointed me to sources that strongly boost the odds that this conversation was a real
event. Richard also presents arguments from probability theory pertaining to
religious matters. See, for example, this video.

Now, to see what
Laplace was on about, we should think about model comparison in the terms that
I have introduced here and discussed further in the article linked to above. Its true that Bayesian model comparison would not be formally
described for more than a hundred years after Laplace’s death, but as the
founder of Bayesian inference and a mathematician of extraordinary genius and
natural insight, he must have been capable of perceiving the required logic.
(Part of the beauty of Bayesian statistics is that hypothesis testing,
parameter estimation, and model comparison are really only slightly different
versions of the same problem – this gives it a logical unity and coherence that
other approaches can only dream of enviously.)

To approach the
problem, lets imagine a data set of just a few points – lets say 6 points,
which we would like to fit with a polynomial function. The obvious first choice
is to try a straight line. Illustrated below are the imagined data and the
fitted straight line, which is the maximum likelihood estimate.

Because there is noise in the data, the fitted line
misses all the data points, so there are some residuals associated with this
fit. Is there a way to reduce the residuals, i.e. to have a model that passes
closer to the measured data points? Of course there is: just increase the
number of free parameters in the fitting model. In fact, with only six data
points, a polynomial with terms up to and including the fifth power is already
sufficient to guarantee that the residuals are reduced to exactly zero, as
illustrated below, with exactly the same data as before.

Is this
sufficient to make the fifth-order polynomial the more likely model? Certainly
not. This more complex model has 6 fitting parameters, as opposed to only 2 for
the linear fit. As I explained previously, though, each additional degree of
freedom adds another dimension to the parameter sample space, which necessarily
reduces the amount of prior probability for the parameters in the maximum
likelihood region – the available prior probability needs to be spread much
more thinly in order to cover the extended sample space. This is the penalty introduced
in the form of the Ockham factor. This reduced prior probability, of
course, results in a lower posterior probability for the model in most cases.

Now the
hypothesis that Napoleon and Lagrange wanted Laplace to take seriously, the one
about that omnipotent deity, is one with infinite degrees of freedom. That’s
the definition of omnipotent: there is nothing that God can’t do if it wants
to. That means infinitely many dimensions in the parameter sample space, and
therefore infinitely low prior probability at all points. To see the
probability that God exists vanish to zero, we only need to postulate any
alternative model of reality with finite degrees of freedom. If my
interpretation of Laplace’s comment is correct, this is the fact that he was
able to perceive: that there is simply no amount of evidence that could raise
the hypothesis of an omnipotent deity to a level of plausibility competitive
with other theories of reality.

And what if we
relax the requirement for God to be strictly omnipotent? It makes bugger all difference.
Every prayer supposedly answered or not answered, every person saved from
tragedy or not saved, every event attributed to God’s will represents another
degree of freedom. That’s still a tremendous number of degrees of freedom, and
while it may be finite, it’s still many orders of magnitude greater than the
numbers of free parameters that most specialists would claim to be sufficient
for a complete theory of the universe.

At this point,
we can take note of yet another important scientific principle that can be
recognized as just a special case of Bayes’ theorem, this time Karl Popper’s
principal of falsifiability. Popper recognized that in order for a hypothesis to
be treated as scientific, and worthy of rational investigation, it must be
vulnerable to falsification. That means that a theory must be capable of making
specific predictions, which, if they fail to arise in a suitable experiment,
will identify the theory as false. If a theory is not falsifiable, then any
data nature throws our way can be accommodated by it. This means that the
theory predicts nothing whatsoever. Popper coined the term ‘pseudoscience’ for
theories like this, such as psychoanalysis and astrology.

Now, if a theory
is consistent with all conceivable data sets (i.e. unfalsifiable), this means
that the associated model curve is capable of traversing all possible paths
through the sample space for the data - just like the 5th order polynomial was able to land exactly on all 6 data points, above, regardless of where they were. Assuming that there is no limit to the number of observations we can accrue, this implies that the model has infinite
degrees of freedom, which, as we have just discovered, is really bad news: thanks
to the penalty introduced by the Ockham factor, this leaves you with a theory
with zero credibility.

The fact that we
can derive important and well-known principles of common sense and scientific
methodology, such as Ockham’s razor and the principle of falsifiability, as consequences of Bayes’ theorem illustrates further what I have said above
about the logical unity of this system. This is why I believe that Bayesian
inference, along with the broader theory that it fits into, constitutes the
most comprehensive and coherent theory of how knowledge is acquired. (I’ll get
round to that broader theory some day, but it should be clear already that
Bayes’ theorem is a consequence of more general principles.)

Much of my interest in
science comes from deriving great pleasure from knowledge. Real respect for
knowledge, however, demands an assessment of its quality. Its not enough to
know that experts say that a meteor hitting the Earth killed off the
dinosaurs - I want to know how convincingly that explanation stands up beside
competing hypotheses. That’s why I’m interested in probability. This is the
theory that permits this necessary appraisal of knowledge, the theory of how we
know what we know, and how well we know it. Science is the systematic attempt
to maximize the quality of our knowledge, and so probability is therefore also
the underlying theory of science.

Lets recap the
main points, in terms as simple as I can manage. If I try to fit a sequence of
data points with a straight line, Ax + B, then there are 2 adjustable model
parameters, A and B. So the chosen parameters are represented by coordinates,
(x, y), on a two-dimensional plane. If I want a more complicated model, with
one more degree of freedom, then the chosen point becomes (x, y, z), in a 3D
space. Each free parameter results in an additional dimension for the parameter
space. In the 2D case, for example, the prior probability for the point (x, y)
is the product of the individual prior probabilities:

P(x | I) × P(y | I)

Since these prior probabilities are all less then one, then the more degrees of freedom there are, the smaller the prior probability will be for any particular point, (x, y, z, ….). If there are infinite degrees of freedom, then the prior probability associated with any point in the parameter space will be zero.

P(x | I) × P(y | I)

Since these prior probabilities are all less then one, then the more degrees of freedom there are, the smaller the prior probability will be for any particular point, (x, y, z, ….). If there are infinite degrees of freedom, then the prior probability associated with any point in the parameter space will be zero.

The posterior
probability for a model depends strongly on this prior probability distribution
over the parameter space, as shown by equation (4) in my article on the Okham factor. If the prior probabilities for the points (x, y, z, …) in the
parameter space are all zero, then the probability for the model is also zero.

Any
unfalsifiable theory must have infinite degrees of freedom in order to be able
to remain consistent with all conceivable observations. With limited degrees of
freedom, the complexity of the path traced by the model curve will also be
limited, and the theory will be vulnerable to falsification – the model curve
will not be guaranteed to be able to find a path that travels to each data
point. Any unfalsifiable theory, therefore, has zero posterior probability.
This includes the hypothesis of an omnipotent deity. Because of its unlimited
powers, such an entity is capable of producing any sequence of events it
chooses, meaning that we need a model curve with infinite free parameters to be
guaranteed access to all data points.