Huge congratulations to Peter Higgs and the other theorists who, half a century ago, predicted this particle as part of the standard model, and to the experimentalists at CERN involved in making this recent discovery. We now have extremely compelling evidence of a new particle (new to us, at least) consistent with the Higgs boson.

Sadly, there's much confusion circulating about the meaning of the 5σ significance level reported for the data on the Higgs. Lets first clear up what this means, before examining some of the confusion.

The null hypothesis, which states that there is no signal present in the data, only noise, implies that the magnitude for some parameter, Y, is some value, μ. Since there is random noise in the data, however, the value of Y observed in a measurement is unlikely to be exactly μ, but will follow a probability distribution with mean μ, and standard deviation σ. If the null hypothesis is true, then we can expect a measurement of Y to produce a result close to μ. The probability is less than 6 × 10

To recap, the reported p-value is the probability to observe data as extreme as or more extreme than than the data observed, assuming the null hypothesis is true. Roughly speeking, the p-value is P(D | H

Spiegelhalter has found numerous examples of this error, including writers from New Scientist and Nature, who really ought to know better. These are presumably professional journalists, however, who can perhaps be granted some slack, but what about a renowned cosmologist? How about these words:

In the article linked above, Spiegelhalter praises a couple of writers for getting the meaning of the p-value the right way round, but really these authors also fail to properly understand the matter. The passages, respectively from the BBC and from the Wall Street Journal, were:

This might seem like a subtlety not really worth complaining about, but along with the other errors mentioned, it is a perpetuation of the fallacies surrounding the frequentist hypothesis tests - fallacies that mask the fact that the p-value is a poor summary of a data set, as I have explained in The Insignificance of Significance Tests. Why, for example, is it considered so important to rule out the null hypothesis so comprehensively, when there is little or no attempt to quantify the other alternative hypotheses? The possibility that the observation is a new particle, but not the Higgs, will no doubt be investigated substantially, but presumably will never be presented as a probability. The probability for fraud or systematic error will presumably receive absolutely no formal quantification at all.

Recently, we had a good example of how calculating the probability for measurement error could have prevented substantial trouble. As Ted Bunn has explained, it was a simple matter to estimate this quantity when researchers publicized results suggesting superluminal velocities for neutrinos, yet several groups set up experiments to try to replicate those result, at significant expense, despite the obviousness of the conclusion.

The traditional hypothesis tests set out to quantify, in a rather backwards way, the degree of belief we should have in the null hypothesis, but this is not what we are really interested in. What we want to know when a we look at the results of a scientific study is: what is the probability that the hypothesized phenomenon is real? For this, we should calculate a posterior probability. To do this, we need a specified hypothesis space with an accompanying prior probability distribution. This is one of the things that leads to great discomfort among some statistical thinkers, and was instrumental in leading to the adoption of the p-value as the standard measure of an experiment: how can you obtain a porsterior probability without violating scientific objectivity? How can you assign a prior probability without introducing unacceptable bias into the interpretation of the observations?

The people who think like this, though, don't seem to realize that prior probabilities can be assigned in a perfectly rigorous mathematical way, that will not introduce any unwarranted, subjective degree of belief. There seems to be a feeling like 'we don't know enough to specify the prior probability correctly,' but this is ridiculous - missing knowledge is exactly what probability theory is for. Any probability is just a formally derived, distilled summary of existing information. To calculate the probability for the Higgs, in principle all you need to do is start with a non-informative prior (e.g. uniform), figure out all the relevant information, and bit by bit account for each piece of information using Bayes' theorem. Sure, the bit about figuring out the relevant information is hard, but all those clever particle physicists at CERN must be able, if they put their minds to it. Then it is just a matter of investigating how well the various hypotheses, with and without the Higgs boson, account for various observations, and how efficiently they do so. We'll get a glimpse of how this is done, in a future post, when I get around to discussing model comparison, and a formal implementation of Ockham's razor.

Yes, there will always need to be assumptions made, in order to carry out such a program, but inductive inference is impossible without assumptions, (if you don't believe me, try it!) and any competent scientist (almost by definition) will be able to keep the assumptions used to those that either are accepted by practically everybody, or make negligible difference to the result of the calculation. It is the frequentists who are committing a fallacy by trying to learn in a vacuum.

I find it a great pity that the posterior distribution was not the chosen route taken for reduction of the data from the Large Hadron Collider. Firstly, this is an extremely fundamental question, and if any question deserves the best possible answer from the available data, this is it. The p-value simply doesn't extract all the information from the observations we now have at our disposal. Secondly, yes: it is a hugely complicated problem to assign a prior probability to a hypothesis like this, but this is a project involving a huge number of presumably some of the finest scientific minds around - if anybody can do it, they can. If they did, it would first of all prove that it can be done, and secondly it would lay a very strong foundation for the development of general techniques and standards for the computation of all manner of difficult-to-obtain priors.

The null hypothesis, which states that there is no signal present in the data, only noise, implies that the magnitude for some parameter, Y, is some value, μ. Since there is random noise in the data, however, the value of Y observed in a measurement is unlikely to be exactly μ, but will follow a probability distribution with mean μ, and standard deviation σ. If the null hypothesis is true, then we can expect a measurement of Y to produce a result close to μ. The probability is less than 6 × 10

^{-7}that a measured value will be 5 standard deviations or further from μ, assuming the null hypothesis is true. This number is the p-value associated with the measurement. (In fact, for this case, a single-tailed test was performed, so the reported p-value for the Higgs boson is half this, about 2.8 × 10^{-7}.)To recap, the reported p-value is the probability to observe data as extreme as or more extreme than than the data observed, assuming the null hypothesis is true. Roughly speeking, the p-value is P(D | H

_{0}). David Spiegelhalter, over at the Understanding Uncertainty blog, has been monitoring press reports on the recent Higgs announcement, finding a high degree of misunderstanding among the journalists. Many writers described the reported p-value as the probability that the null hypothesis is true, calculated from the data, but this is P(H_{0}| D), a very different thing, as I have described before.Spiegelhalter has found numerous examples of this error, including writers from New Scientist and Nature, who really ought to know better. These are presumably professional journalists, however, who can perhaps be granted some slack, but what about a renowned cosmologist? How about these words:

Each experiment quotes a likelihood of very close to “5 sigma,” meaning the likelihood that the events were produced by chance is less than one in 3.5 million.These words come from Lawrence Krauss, a highly respected theoretical physicist, and they are wrong. I mentioned in my very first blog post that physicists are not generally so great at statistics, but I really don't like being proven right so blatantly. I suppose we really mustn't feel too bad about the mistakes in the press, when the experts can't get it right either.

In the article linked above, Spiegelhalter praises a couple of writers for getting the meaning of the p-value the right way round, but really these authors also fail to properly understand the matter. The passages, respectively from the BBC and from the Wall Street Journal, were:

...they had attained a confidence level just at the "five-sigma" point - about a one-in-3.5 million chance that the signal they see would appear if there were no Higgs particle.and

If the particle doesn't exist, one in 3.5 million is the chance an experiment just like the one announced this week would nevertheless come up with a result appearing to confirm it does exist.While both these passages deserve praise for recognizing the p-value as the probability for the data, rather than the probability for the null hypothesis, they unfortunately are still not correct, for a reason that will bring me neatly in a moment to my second major point about the standard of reporting results such as these in terms of p-values. The reason these passages are wrong is that they employ an overly restrictive interpretation of the null hypothesis. They assume H

_{0}is the same as '"the Higgs boson does not exist," whereas the true meaning is broader: "there is no systematic cause of any patterns present in the data." If we reject the null hypothesis, we still have other possible non-Higgs explanations to rule out before we are sure that the Higgs boson is not a fantasy. These alternatives include other particles, consistent with different physical models; systematic errors in the measurements; and scientific fraud.This might seem like a subtlety not really worth complaining about, but along with the other errors mentioned, it is a perpetuation of the fallacies surrounding the frequentist hypothesis tests - fallacies that mask the fact that the p-value is a poor summary of a data set, as I have explained in The Insignificance of Significance Tests. Why, for example, is it considered so important to rule out the null hypothesis so comprehensively, when there is little or no attempt to quantify the other alternative hypotheses? The possibility that the observation is a new particle, but not the Higgs, will no doubt be investigated substantially, but presumably will never be presented as a probability. The probability for fraud or systematic error will presumably receive absolutely no formal quantification at all.

Recently, we had a good example of how calculating the probability for measurement error could have prevented substantial trouble. As Ted Bunn has explained, it was a simple matter to estimate this quantity when researchers publicized results suggesting superluminal velocities for neutrinos, yet several groups set up experiments to try to replicate those result, at significant expense, despite the obviousness of the conclusion.

The traditional hypothesis tests set out to quantify, in a rather backwards way, the degree of belief we should have in the null hypothesis, but this is not what we are really interested in. What we want to know when a we look at the results of a scientific study is: what is the probability that the hypothesized phenomenon is real? For this, we should calculate a posterior probability. To do this, we need a specified hypothesis space with an accompanying prior probability distribution. This is one of the things that leads to great discomfort among some statistical thinkers, and was instrumental in leading to the adoption of the p-value as the standard measure of an experiment: how can you obtain a porsterior probability without violating scientific objectivity? How can you assign a prior probability without introducing unacceptable bias into the interpretation of the observations?

The people who think like this, though, don't seem to realize that prior probabilities can be assigned in a perfectly rigorous mathematical way, that will not introduce any unwarranted, subjective degree of belief. There seems to be a feeling like 'we don't know enough to specify the prior probability correctly,' but this is ridiculous - missing knowledge is exactly what probability theory is for. Any probability is just a formally derived, distilled summary of existing information. To calculate the probability for the Higgs, in principle all you need to do is start with a non-informative prior (e.g. uniform), figure out all the relevant information, and bit by bit account for each piece of information using Bayes' theorem. Sure, the bit about figuring out the relevant information is hard, but all those clever particle physicists at CERN must be able, if they put their minds to it. Then it is just a matter of investigating how well the various hypotheses, with and without the Higgs boson, account for various observations, and how efficiently they do so. We'll get a glimpse of how this is done, in a future post, when I get around to discussing model comparison, and a formal implementation of Ockham's razor.

Yes, there will always need to be assumptions made, in order to carry out such a program, but inductive inference is impossible without assumptions, (if you don't believe me, try it!) and any competent scientist (almost by definition) will be able to keep the assumptions used to those that either are accepted by practically everybody, or make negligible difference to the result of the calculation. It is the frequentists who are committing a fallacy by trying to learn in a vacuum.

I find it a great pity that the posterior distribution was not the chosen route taken for reduction of the data from the Large Hadron Collider. Firstly, this is an extremely fundamental question, and if any question deserves the best possible answer from the available data, this is it. The p-value simply doesn't extract all the information from the observations we now have at our disposal. Secondly, yes: it is a hugely complicated problem to assign a prior probability to a hypothesis like this, but this is a project involving a huge number of presumably some of the finest scientific minds around - if anybody can do it, they can. If they did, it would first of all prove that it can be done, and secondly it would lay a very strong foundation for the development of general techniques and standards for the computation of all manner of difficult-to-obtain priors.

## No comments:

## Post a Comment