Thursday, August 3, 2017

Standard Error

In the quantification of uncertainty, there is an important distinction that's often overlooked. This is the distinction between the dispersion of a distribution, and the dispersion of the mean of the distribution. 

By 'dispersion of a distribution,' I mean how poorly is the mass of that probability distribution localized in hypothesis space. If half the employees in Company A are aged between 30 and 40, and half the employees in Company B are aged between 25 and 50, then (all else equal) the probability distribution over the age of a randomly sampled employee from Company B has a wider dispersion then the corresponding distribution for Company A.

A common measure of dispersion is the standard deviation, which is the average of the distance between all the parts of the distribution and the mean of that distribution.

Saturday, October 31, 2015

Multi-level modeling

In a post last year, I went through some inference problems concerning a hypothetical medical test. For example, using the known rate of occurrence of some disease, and the known characteristics of a diagnostic test (false-positive and false-negative rates), we were able to obtain the probability that a subject has the disease, based on the test result.

In this post, I'll demonstrate some hierarchical modeling, in a similar context of medical diagnosis. Suppose we know the characteristics of the diagnostic test, but not the frequency of occurrence of the disease, can we figure this out from a set of test results?
A medical screening test has a false-positive rate of 0.15 and a false-negative rate of 0.1. One thousand randomly sampled subjects were tested, resulting in 213 positive test results. What is the posterior distribution over the background prevalence of the disease in this population?

Saturday, April 25, 2015

Mean vs median - a careful balancing act

Two common measures of the location of a probability distribution are the mean and the median. While generally, they are quite different things, some familiar distributions have their mean and median at the same point (all such distributions are symmetric, (see comment, below) and vice versa).

The mean of a distribution, as we all know, is its average, while the median is, roughly speaking, the point at which the amount of probability mass to one side is the same as the amount on the other side. Upon hasty consideration, these definitions can appear to denote the same thing, and so confusion between the two concepts is common. Annoyingly, my own PhD thesis contains a sentence1 that explicitly confuses the mean for the median (and furthermore, none of the half dozen eminent scientists whose job it was to assess my thesis (who otherwise all did an excellent job!) reported noticing this blunder).

Confusion between the mean and the median is highly analogous to a difficulty experienced by many young children when they try to balance asymmetric blocks on top of one another, as has been reported by cognitive scientist Annette Karmiloff-Smith2.

Saturday, April 18, 2015

The Fundamental Confidence Fallacy

The title of this post comes from an excellent recent paper (as far as I can tell, still in draft form) on misunderstandings of confidence intervals. The paper, 'The fallacy of placing confidence in confidence intervals', by R. D. Morey et al.1 is by almost exactly the same set of authors whose earlier paper on a very similar topic I criticized, before, but the current paper does a far better job of explaining the authors' position, and arguing for it.

The authors identify the fundamental confidence fallacy (FCF) as believing automatically that,
If the probability that a random interval contains the true value is X%, then the plausibility (or probability) that a particular observed interval contains the true value is also X%.

Friday, December 12, 2014

Science is for Everyone

In the previous post, I explained that science is suitable for investigating all matters. Pursuing a similar theme, I want now to discuss how science is for all people, not just bearded academics with white lab coats. (Pardon the stereotype, and let me emphasize that there is no good reason why 50% of all scientists should not be women.)

I mentioned something in that last post that is also central to this discussion: scientific method is a graded affair - not black or white. Whatever we can learn by implementing a low level of scientific rigour, we can learn a little more, in a little more detail, and with a little more confidence, by applying a slightly more systematic procedure.


It perplexes me that the word 'scientism' is predominantly used as a slur to put people down and criticize their world view and methodology. I realized something recently, however, that helped me understand the error that is often being made, and how that error compounds the problem that is often being called out when people make the accusation of scientism.

First off, lets settle what scientism is. Wikipedia gives a good definition, that fits well with the contexts in which I see the term used:
Scientism is belief in the universal applicability of the scientific method and approach, and the view that empirical science constitutes the most authoritative worldview or most valuable part of human learning to the exclusion of other viewpoints.  

Saturday, November 8, 2014

Probability Trees and Marginal Distributions

In a blog post earlier this year about medical screening, On the hazards of significance testing. Part 1: the screening problem, statistical expert David Colquhoun demonstrates a simple way of visualizing the structure of certain probabilistic problems. This diagram, which we might call a probability tree, makes the sometimes counter-intuitive solutions to such problems far more easy to grasp (and in the process, helps put over-inflated claims about the effectiveness of screening into perspective).