Statistics is not boring. Really! It can be (and often is) made boring by bad teaching, but the underlying subject is fascinating -- drawing valid conclusions from seemingly noisy or random observations, by a process that can look like magic to the uninitiated. And like any applied mathematical technique, there's much more to it than just the mathematics itself. The correct application, and explanation of that application, are equally as important to get right. And so here Abelson describes his MAGIC approach to principled statistical arguments: the researcher needs to consider Magnitude, Articulation, Generality, Interestingness, and Credibility.
Statistical significance is not enough. It merely tells you whether your observations could have arisen by chance, or, as Abelson delightfully puts it:
But as he points out, this isn't very helpful, since p depends on the sample size: the bigger the sample, the more "statistically significant" the result. (This is particularly a problem in areas like computer science, where doing a few thousand, or million runs, might be quite cheap, and ridiculously small p values can be obtained.) So it is good practice also to quote the effect size: not only show your observation is not due to chance, but show that the difference is big enough to get excited about.
However, even having a significant p value might not mean what you think it means. Abelson describes The Replication Fallacy, which is:
Articulation is about reporting the results clearly, without getting lost in irrelevant and uninteresting details (but without fudging the important but inconvenient details). This can be difficult if the results are inconclusive or borderline. Abelson makes the very important, but often forgotten point that:
Even if the null hypothesis is refuted, it can still be hard to articulate the results. Abelson gives advice around "ticks and buts", that help expose the interesting and relevant results.
All results are in some sense specific to the particular experiments run, but are actually only useful if they can be interpreted more generally. If every experiment can say only what those people did, on that day, under those circumstances, it's not very useful: we want to know what people in general will do, or even, why they do it.
This might require doing more experiments, particularly to test a theory. If your theory explains why something happens, you should be able to construct a situation in which it won't:
This fits in with the overall approach: statistical analysis isn't about a single isolated experiment, it's about a series of experiments advancing and changing the understanding, contributing to the lore.
Moreover, results should be interesting: they should change the way people think about the subject, they should be "surprising". What makes something interesting?
One needs to be careful here, however. There is folk theory that is "generally believed" but for which there is precious little evidence. The move for evidence-based medicine is based on this observation: there are things everybody knows that just ain't so. In these cases it is important to gather the evidence. I agree, evidence that disproves such folk theories is moreinteresting, but in some cases, it might not be more important. Nevertheless, in general this is good advice, and Abelson suggests a "surprisingness coefficient": how different the observed result is from what you expect it to be. If you expect the null hypothesis to hold (which, let's face it, you rarely do), then the surprisingness is the same as the effect size. But if you expect a big effect, and find it, that's not very surprising. If you expect an effect in one direction and find it in the other, that's the best of all!
Finally, we come on to credibility. If your results beggar belief, then your methodology is going to be attacked. You can counter this by guarding against the well known problems (which grow in number as the lore progresses), but eventually, if your result is just too unbelievable, you are probably going to have to come up with a corresponding theory that is better than the prevailing one. Anomalous results are not enough to overturn the world: you have to replace it with something better.
As well as the MAGIC chapters, there is a lovely chapter entitled "On Suspecting Fishiness" (who could resist buying such a book?), which highlights some things to look out for that could indicate error, or worse, fraud. It includes an interesting little tale about Mendel and his pea plants. We've all been brought up on the story that Mendel cheated, and massaged his results to get the answer he wanted, caught out by clever statisticians who showed his answers were just too good to be true. Well, maybe the story isn't as clear. Here Abelson recounts a different analysis that shows Mendel might have got his results by using a dodgy statistical procedure: not such a heinous crime after all, particularly as statistics wasn't the well-developed subject in the mid-1800s that it is today.
This is not an introductory text: it assumes you know about t tests and p values, etc. But it is also not particularly mathematical -- it is simply full of good, clear-headed advice, wisdom even, of doing statistics properly. Although it is written from the perspective of a psychologist analysing experiments done in noisy environments on a small number of subjects, its advice is a must read for anyone using statistics seriously.