Tuesday, June 25, 2013

Crime and Punishment

There’s been an idea circulating for some time that retributive justice is morally and logically founded upon the fact that we possess a thing called free will - some assumed weird mechanism that disconnects human behaviour from the normal cause-and-effect based evolution of nature. If, after all, human actions were really ‘just’ the result of mechanistic microscopic processes, then whatever we do would be entirely determined by the laws of physics and the configuration of our environment. And if this were really so, then whatever somebody does is a consequence of the fact that they could not have willfully done otherwise, in which case there is no sense in which a person can be blamed for doing wrong. And if culpability can not be established, then doesn't the validity of punishment look suspect? So prevalent is this idea that it forms a major part of contemporary legal philosophy.

Not only is this idea of free will completely nonsensical, but the connection between it and the justice of retribution is totally unfounded. Vengeance, after all is really just an expression of anger. Is anger rational? Is it a reliable, systematic producer of well judged behaviour? Or is it merely a crude and ancient heuristic moderator of human interaction that in a modern, enlightened era, we could do with much less of?

There is simply no logical link between culpability and the righteousness of retributive punishment,  which somehow ‘repays a debt to society.’ Try to derive this principle logically, and you will find it impossible without directly assuming the desired outcome among the required premises. 

What we must see instead is that, in line with more agreeable consequentialist moral philosophies, the only appropriate consideration when assigning juridical interventions is: what actions will lead to a better society for us, and for our children to grow up in? In this case, the problem justifying enforced treatment (e.g. imprisonment) upon somebody who ‘couldn't have acted any other way’ disappears completely. The enforced treatment is only indirectly determined by the person’s actions, and is wholly derived from what we would like the world to look like in the future. The relevance of past behaviour is limited to the extent to which it serves as a predictor of future behaviour. What are traditionally viewed as punishments - justice administered for the satisfaction of the victims - become more properly viewed as treatments, designed to minimize the cost for society of a person’s demonstrated antisocial tendencies. 

The desire for revenge against a person who has committed wrongs against us is likely to be at least partly due to population genetics, naturally selected for self-preserving behaviour (it is advantageous for me to create an environment in which another’s bad behaviour toward me makes life uncomfortable for them), but the idea linking this concept of justice to free will seems to be far more memetic than genetic: it is a matter of culture.

The concept that free will is necessary and sufficient to entail the punishment of moral failing seems to date back to Aristotle, in Nicomachean Ethics. I’m no scholar of Aristotle, but to me its not clear whether for him the appropriateness of blame has a consequentialist or an absolutist foundation - are praise and blame desirable because they make certain modes of future behaviour more likely, or because they try to balance what has happened in the past?

If I had to speculate on the reason for the cultural success of the notion specifically linking retribution to free will, I’d guess that it was found to come in very useful when dictators wrestled with the seemingly contradictory goals of being loved, yet being utterly feared.

How can you be brutally violent against your enemies, while remaining admired by the remaining population? One way would seem to be to claim that violence against certain people is morally just, even necessary. “It made me cry to do that to him, but his crimes left me no choice.” Such pious adherence to absolute moral principle, even when it demands the most unpleasant actions, might even elevate a thug to saintly status, bringing joyous tears to the eyes of his devoted followers.

In the course of time, it may be that neuroscience, experimental psychology, and the social sciences will come to the conclusion that a better society is generally one in which people’s innate desire for vengeance is somewhat fulfilled (I doubt this, as I’ll explain shortly), but this would not undermine the principle that treatment of criminals should be determined on purely consequentialist grounds. If it happened to be that this desire was so strong, and so innate that no amount of cultural evolution could remove it, and that the frustration of unplacated victims of crime was so intense as to threaten civil unrest, then a retributive element may need to be restored, but the ultimate reasoning would be the rational evaluation of different courses of action, and selection in favour of those strategies determined to be in society’s best interests.

The debate between absolutist and consequentialist moral philosophies has been going on for a long time: consequentialism goes at least as far back as Machiavelli, around 500 years ago. Absolutism goes much further back, and persists still. This is really quite surprising - its not a difficult problem to solve. All morality is manifestly consequentialist, no matter what we might profess. 

Wait a moment, ‘thou shalt not kill.’ It doesn't get much more absolutist than that does it? No, it doesn't  But just how absolutist is that exactly?

For starters, no society implements principles like this in the strict absolutist way. Christians believe that this basic rule, ‘thou shalt not kill’ was handed to them by their personal deity: thou shalt not kill means that killing is absolutely wrong, under all circumstances - no exceptions allowed. Its never stopped Christian nations going to war when they felt like it. It never prevented Christian inquisitors burning people at the stake when the winter nights were dark and cold. All assumed absolutist principles have always been tacitly appended with a host of additional clauses beginning with the word ‘Unless...’ This is pure consequentialism.

Well, maybe those people adding their arbitrary ‘unless’ clauses were simply bad moralists. Thou shalt not kill is a good rule after all, right? Yes, typically. But what if the person who you are invited to consider killing has a strong ambition to kill you at the earliest convenient moment? Or alternatively, what if that person suffers intolerably, with no hope of improvement, ever? Killing can not be said to be categorically wrong under all circumstances - it all depends on the consequences.

Finally, absolutist versions of morality, in the sense that the content of the principle, “X is wrong,” takes precedence over the actual likely outcomes of performing X, are actually demonstrably incoherent. Lay aside the problem of what could possibly be the source of any absolute moral principle. Suppose for a moment that such principles really are set by some divine entity. What then? These moral laws are obviously not physical laws, since we have the capacity to systematically deviate (if we didn’t, they wouldn’t be called moral laws in the first place). Thus, somewhere in the process of our minds, decisions are made about whether or not to follow a particular moral principle at a particular time. If we believe that Godzilla will roast us alive for eternity if we fail to follow the rules, then those predicted consequences are what guide our behaviour. Moral decisions are always the result of a consequentialist evaluation of the options.

Going a little beyond the standard terminology, then, morality is absolute, but with only one rule: “whatever actions are revealed by a rational analysis to be most likely to bring me closer to achieving my goals are the actions I should implement.” This is exactly as I demonstrated in an earlier article on scientific morality. Furthermore, it illustrates that the founding principles of that argument, (1) goodness does not exist outside minds and (2) morality is doing what is good, are both properly basic: they are necessarily correct, and our knowledge of them is not contingent upon empirical observations.

Lets get back to the potential role of retribution in an advanced consequentialist morality. The extent to which the will to see wrongdoers punished is genetically innate, as opposed to culturally transmitted, is certainly an interesting question, and one whose investigation would no doubt require some ingenious experimental protocols. But I strongly suspect that the innateness of these feelings is limited to an extent that can easily be overruled by rationality, allowing vengeance to be effectively eliminated from all consideration in the problem of dealing with criminals. There are several reasons for this suspicion.

Firstly, if we look at the portion of the population most commonly found expressing anger, I’m fairly sure it'll be small children. Anger is, we all recognize, a childish emotion. We grow out of it. We learn (with great relief to most, I presume) to control it, and when as adults we occasionally succumb to emotional outbursts, we typically feel silly afterwards. As advanced society has developed, we have continually learned, oh so painfully slowly, that anger and resentment typically achieve little except the propagation of more anger and resentment. 

Secondly, there seems to be considerable evidence showing that the traditional practices of retributive justice have failed miserably. This paper, for example, argues strongly that imprisonment is ineffective at reducing the frequency and intensity of crime, and that alternative treatments such as education achieve greater reductions of recidivism. Another article summarizes some of its findings: "Research into specific deterrence shows that imprisonment has, at best, no effect on the rate of reoffending and often results in a greater rates of recidivism." The utilitarian advantages of a more rational approach seem to be there for the taking.

Thirdly, whatever memetic components there are, supporting any in-built tendency to desire vengeance, they can, by definition, be overcome by changing our culture.

Fourthly, religious leaders throughout history seem to have made artful use of the philosophy of free will in order to bolster acceptance of their reign of terror (hell doesn’t seem very fair, if all your actions are fixed by the way God set up the boundary conditions, and so damnation only gains a veneer of coherence if we have free will - a notion that evidently has to extend to the mortal plane, in order to justify certain historical hobbies of the major religions). This suggests that the hard-wired machinery of anger was, stripped of any socially conditioned props, insufficient to sustain the required levels of violence in our ever increasingly sophisticated culture.

When it comes to figuring out how to deal with crime, therefore, it is irrational to decide based on a shortsighted lust to see a criminal's debt repaid through suffering. Instead, we must look to scientific data to decide what courses of action minimize the costs to society. We must seek to understand what treatments will cost-effectively turn today's rule breakers into tomorrow's contributors to society, and what measures will economically eliminate the desire and the opportunity to commit crimes in the first place. 

Monday, June 17, 2013

Extreme values: P = 1 and P = 0

There is a popular folk theorem among some Bayesians, to the effect that it is unacceptable for a probability to be 0 or 1. There's a simple motivation for this principle: as rationalists, we demand the opportunity for nature to educate us by blessing us with novel observations. No matter how confident we become in some proposition, it should always be possible for us to change our minds when strong enough evidence accumulates in favour of some alternative. As Karl Popper rightly observed, after all, a theory that is invulnerable to falsification is not much of a theory.

But what happens if P(H | I) becomes zero? How is the probability for the hypothesis, H, to be updated by new evidence? If P(H | I) is 0 then the numerator in Bayes' theorem, prior times likelihood,

P(H | I) × P(D | HI)

is also 0, regardless how convincing the data, D, may be. No matter what happens, the outcome is unchanged: a nice round posterior.

Similarly, if P(H | I) is 1, then for the converse hypothesis, P(~H | I) is necessarily 0. Now, the denominator in Bayes' theorem is 

P(H | I) × P(D | HI) + P(~H | I) × P(D | ~HI)

and when the second term (everything after the plus sign) is zero, both numerator and denominator in Bayes' theorem are the same, producing the ratio 1, for all eternity.

I have sympathy with this motivation, therefore, but as a general rule, it is utter nonsense, resulting from forgetting one of the most basic facts about how inference works. The mathematics I have just described is all correct, but there are other ways for us to change our minds, and retain our rationality.

A recent, brief discussion at another website drew my attention to an article by Eliezer Yudkowsky, in which he also argues that 0 and 1 are not probabilities. The argument is a little different: the amount of evidence (the likelihood ratio expressed in log-odds form) needed to update an intermediate probability to 0 or 1 is infinite. This infinite certainty is an absurdity, he claims, unable to be represented with real numbers, and so 0 and 1 aren't probabilities.

Yudkowsky, as many readers will know, is a widely regarded thinker and writer on the topic of applied rationality, and I can recommend his writing most highly. The overlap between his broad philosophy and mine is, I would say, very large, with the main difference that in cases where I lack mastery of the theoretical apparatus, he very often does not. Yudkowsky knows and understands the mind-projection fallacy better than the vast majority (see for example his article of the same name, and this followup), but in this instance, he seems to have forgotten it. It is essentially the same error made by all who claim that probabilities equal to zero or one should not enter one's calculations.

A little thought experiment, then, before resolving the paradox. Let H be the hypothesis that in some five-day interval, at some location on the Earth, the sun will rise on each of the five mornings. Let D represent the observation of the sun rising on the first of the mornings in question. What is P(D | HI)? I humbly submit that it is 1. Is H, therefore, not an appropriate, well-formed hypothesis? Is D not a valid observation? Evidently, if probability theory is to have any power at all, it must be capable of supporting hypotheses such as H, and data as trivial as D. It is not conceivable to have such things automatically ruled out under our epistemology.

In general, it is perfectly legal for P(D | HI) (or, for that matter, a posterior, like P(H | DI)) to be 0 or 1, but here's that basic fact about probability that we have to keep in mind: a probability can not be divorced from the model within which it is calculated. A model may imply infinite certainty, without any person ever achieving that state (which would be impossible to encode in their brain, anyway). Our notation says something very important: P(D | HI), no matter what it is, is necessarily contingent upon the conjunction HI, which obviously depends on the truth of I. This is something we can never be absolutely certain of.

The all-important "I" that forms the foundation for every Bayesian calculation is usually said to stand for 'information' - all the relevant prior knowledge we have. Unfortunately, this creates a little trap that too many fall into, which is to forget that there is another component besides information needed before "I" is fully populated. "I" could just as easily stand for 'imagination.' To get Bayes' theorem to do any useful work for us, we have to specify a theoretical framework. We have to make certain assumptions, including specification of a full set of hypotheses against which is to H compete. To arrive at a candidate set of hypotheses, we must make a leap of the imagination. There is no possible criterion for judging whether or not all our assumptions are correct, and no way to know in advance whether we have chosen the 'correct' set of hypotheses. To think otherwise is just wishful thinking.

To think that the infinite confidence implied under some "I" represents the actual infinite confidence of some physical rational agent is the mind-projection fallacy. Instead, a probability is a model of the confidence a rational agent would have if "I" was known to be true. That this confidence might need to be modelled using a non-numeric concept such as infinity is merely an uncomfortable (though often highly convenient) mathematical fact.

And now we can see how it is that we can continue to accrue knowledge under the threat of the apparent epistemological cul de sac that is P = 1 or P = 0. To liberate ourselves from the straight jacket of "I", we simply need to recognize that what we now call "I" is itself merely a hypothesis in some broader hierarchical model. This is how model checking (wielding the analytical blade of model comparison) works, which, as I pointed out before, seems philosophically unpalatable to many, yet is in fact an essential ingredient in our inferential machinery. This is how we can come to look again at our theoretical framework and say 'hold on, I should be working with a different hypothesis space.' Novel theories and scientific revolutions would be impossible without this flexibility.

Some see this need in Bayesian epistemology to make assumptions in "I" that can't be established with certainty as a severe weakness, but it isn't - at least not one that can be avoided (no matter how many black belts we hold in the ancient art of self deception). We can always extend the scope of our hypothesis space so that some of our assumptions become themselves random variables in  a wider inferential context, but to have all of them take on the role of hypotheses under test would require an infinitely deep hierarchy of models. In the example above, where H was a hypothesis about the sun rising, one might argue that a more sophisticated model would account for the possibility, however small, that my sensation of the sun rising was mistaken. Indeed, this is correct, and would prevent the likelihood function going to 1. Sooner or later, though, I'm going to have to introduce a definitive statement - one that supposes something to be definitely true - in order to avoid the intractable quagmire of infinite complexity.

The early frequentists (and some still, in private communication with me), claimed that this subjectivity of Bayesian probability is its downfall, but in reality, it is impossible to learn in a vacuum. No kind of inference is possible without assumptions. Part of the beauty of Bayesian learning is that we make our assumptions explicit. The frequentists, of course, also make assumptions (see Yudkowsky, for example), but by refusing to acknowledge them, like the fabled ostrich sticking its head in the sand, they eliminate the possibility to examine whether or not they are reasonable, to understand their consequences, or to correct them when they are manifestly wrong.