Comments on Maximum Entropy: Fly papers and photon detectors: another base-rate fallacy

The likelihood is indeed a function of n, but it i...

2013-07-11T21:18:17.430-05:00

The likelihood is indeed a function of n, but it isn't P(n|csp). It is the probability to have observed c counts given n emissions, along with everything else, we know), P(c|nsp).

Cheers for the response. Forgive me if I walk thr...

2013-07-11T20:52:35.885-05:00

Cheers for the response. Forgive me if I walk through this in some detail, it's been a few years since I've studied this, so it's more for my own benefit than anything else :)

You can actually take your equation 7 and turn it into my factorization. Starting with your equation 7, using slightly different notation since I can't figure out out to write a phi character:

p(n|p,c,s) = p(n|p,s) * p(c|n,p,s) / p(c|s,p)

and multiply through by p(c|s,p) to get:

p(n|p,c,s)*p(c|s,p) = p(n|s,p) * p(c|n,p,s)

By the rules of conditional probability -- p(a,b) = p(a|b)*p(b) -- we can combine the left-hand side to get:

p(n,c|s,p) = p(n|s,p) * p(c|n,p,s)

at which point we note that n is completely independent of p (since our emitter doesn't care about the detector) and that c is independent of s conditional on n (for the reasons you give), and we get my factorization of the likelihood:

p(n,c|s,p) = p(c|n,p)*p(n|s)

correct? I don't think there's anything particularly frequentist or Bayesian at this point, it's just some algebra. At this point, c is fixed (or observed, anyway), as are s and p, so our likelihood for n becomes L(n|c,s,p)=p(c|n,p)*p(n|s)

Where I believe you go astray in your frequentist analysis is when you use the fact that random variable c is conditionally independent of the parameter s to treat the factor p(c|n,p) as though it were completely independent of p(n|s) factor; n is precisely the term you are trying to 'twiddle' to maximize your likelihood. Different values of n produce different values of both p(n|s) and p(c|n,p), and so you cannot remove p(n|s) from your optimization without biasing your answer.

If you want to argue that incorporating the prior into the likelihood is what makes it a "Bayesian" analysis, then I think we're off to a different (semantic) discussion, which probably won't be terribly fruitful :)

In any event, even with my frequentist hat on, I completely agree that it would be incredibly foolish to ignore the prior information on n given by p(n|s); I'd just go one step farther and say that a valid frequentist analysis that pays attention to the dependence structure of the problem *can't* ignore it.

Thanks for dropping by. I am pleased when people...

2013-07-11T17:25:11.888-05:00

Thanks for dropping by.

I am pleased when people want to interact with the blog, but what you have done is the bayesian calculation (look at eq 8). There are a few errors in your reasoning, but your factorization of the likelihood function is the crucial one. The quickest way to see this is that if your algebra is correct, then the bayesian calculation has to be wrong, as I would be missing another factor of P(n|s) (which would be very weird indeed).

Agree with the previous comment; an interesting po...

2013-07-11T16:01:09.977-05:00

Agree with the previous comment; an interesting post, and I've enjoyed browsing through the rest of your writing (got here by way of "Probably Overthinking It", which is why I'm so late to the party).

So, putting my frequentist hat on, I'm concerned that the justification you use for eliminating the Poisson term from the likelihood function when calculating the MLE might not be quite correct, and this might be why you end up giving the ML method a failing grade, when it doesn't deserve it, at least this time :)

If you write out the joint likelihood, you have L(c|n,p,s) = p(c|n,p)*p(n|s). I completely agree that -- if you fix a value for n -- p(n|s) becomes constant and no longer has anything interesting to contribute to the optimization. However, in that case, n must also be fixed in p(c|n,p), since it is the same term, meaning you can no longer maximize your overall likelihood with respect to it; the only term that you could estimate in that case (assuming c is also observed) would be p, if you didn't already know it.

In other words, p is independent of s given n, but n can never be independent of s since both n and s appear together in the p(n|s) factor. This means that you can't drop the Poisson prior from your estimation problem when you're calculating the MLE for n.

I was too lazy to do it symbolically, so I just brute-forced the analysis. If you retain the Poisson term, both n=104 and n=105 have the same likelihood. This seems consistent with your Bayesian analysis, and certainly nowhere near 150 (which, if an analysis did lead to that result, would I agree be plainly silly).

There's certainly several good reasons to still prefer a Bayesian approach (e.g. not needing to invoke asymptotic normality results to obtain a good credible set for n), and certainly other places where MLEs give odd results, but I believe that the MLE for n in this problem is split between 104 and 105 rather than 150, since dropping the Poisson term is not permissible given the independence structure of the problem.

The python code I used to calculate the (log) MLE (less a few multiplicative constant terms) is below, if you are interested, or if you can see where I may have made a mistake.

>>> import scipy.misc as sm
>>> import numpy as np
>>> lik = lambda x:(1. / sm.factorial(x-15.)) * (0.9 ** (x - 15.)) * (100.**x)
>>> print sorted([(x, np.log(lik(x))) for x in range(50,120)], key = lambda x:x[1])
[ ... many omitted results ... (104, 155.90778349933589), (105, 155.90778349933589)]

Thanks, Mike, I'm glad to see you had some fun...

2013-01-05T23:51:42.480-06:00

Thanks, Mike, I'm glad to see you had some fun with the numbers.

Its sort of of straw man, in that clearly no 'good' statistician of any persuasion is going make that mistake in this kind of ultra-extreme case, but the fact remains that several highly respected texts on statistics claim quite explicitly that ML is the only technique required for parameter estimation.

Even if nobody really believes that, it still makes clear the weirdness of a methodology that needs to be patched up, ad hoc, when common sense points out that its rules won't work. And what about intermediate cases, where the breakdown is not so obvious? Better to start with and understand the true logic, then apply the approximations when suitable, rather than assume the approximations always work, with tacit acceptance that sometimes the consequences will be ridiculous.

Beyond the behaviour of reasonably competent stats experts, there is also the fact that many experimental scientists actually do make errors like this in the lab. This is not helped by a general culture that is too uncritical of certain methodological traditions.

Hi Tom, this is a great post and I had fun working...

2013-01-05T17:58:55.751-06:00

Hi Tom, this is a great post and I had fun working through the problem. I ended up plotting the log of the posterior distribution against plausible values of n. Here's the python code I used: https://gist.github.com/4464308. And here's the output with a graph: http://goo.gl/GgFVY.

It's really cool that the posterior is just another Poisson distribution — I wouldn't have noticed that if you hadn't pointed it out. I'm not sure I really understand the intuition behind why this is so.

I do think your description of the technique of maximum likelihood is a bit of a strawman, though. I don't think any statistician, frequentist or no, would make that mistake on a problem this straightforward. What I learned as "maximum likelihood" is exactly what I ended up doing: use Bayes to write down P(n|phi,c,s), ignore anything that's not going to change where the maximum of the curve will be (the 0.1^c term, e.g.), take the log so it's easier to calculate, and then run some numbers through the resulting formula and look for the maximum, which is easy since there's only one parameter.

I'd be kind of curious to see how a frequentist would attack this problem.