A guide for consumers and the media
For Mark Twain’s hierarchy of lies, damned lies and statistics, we should really add epidemiological lies, those reports showing that brown rice or trans-palmitoleic acid will prevent diabetes and diet soda will make you fat, which appear every week or so in ABCNews. (I mean the generic media, but ABCNews and I have a close relationship: sometimes they even print what I tell them). If you’ve been eating white rice instead of brown rice and you develop diabetes ten years later, it is the fault of your choice of rice. Everybody knows that this is ridiculous but the data are there showing an almost 4-fold increased risk, so how can you argue with the numbers.
These kinds of studies are always based on associations and the authors are usually quick to tell you that association doesn’t mean causality even as they interpret the data as a clear guide to action (“Substitution of whole grains, including brown rice, for white rice may lower risk of type 2 diabetes.”) In fact, to most scientists, association can be a strong argument for causality. That is not what’s wrong with them. Philosophically speaking, there are only associations. All we really know is that there is a stream of particles and there is an association between the presence of a magnet and the appearance of a spot on a piece of photographic paper (anybody remember photographic paper?). God does not whisper in your ear that the particle has a magnetic moment. It is the strength of the idea behind the association and the presentation of the idea that determines whether the association implies causality. What most people really mean is that “association does not necessarily imply causality. You may need more information.” What’s wrong with the rice story is that the idea is lacking in common sense. The idea that the type of rice you eat has any meaningful impact by itself, or even whether one can guess whether it has a positive or negative impact on a general lifestyle, is absurd. But what about the statistics? Here the problem is really presentation of the data. The number of papers in the literature pointing out the errors in interpretation of statistics is very large although it is still less than the number of papers making those errors. There are numerous problems and many examples but let’s look at the simplest case: limitations of reporting relative risk and alternatives.
Here’s a good example cited in a highly recommended popular statistics books, Gerd Gigerenzer’s “Calculated Risks.” He discusses a real case, the West of Scotland Coronary Prevention Study (WOSCOPS) comparing the statin drug, pravastatin to placebo in people with high cholesterol. The study was started in 1989 and went on for about 5 years. (These days, I think you can only compare different statins; everybody is so convinced that they are good that a placebo would be considered unethical):
1. First, the press release: “People with high cholesterol can rapidly reduce… their risk of death by 22 per cent by taking…pravastatin.”
2. Now, ask yourself what this means? If 1000 people with high cholesterol take pravastatin, how many people will be saved from a heart attack that might have otherwise killed them? Think about this, then look at the data, the data that should have been reported in the media.
3. The data:
Treatment deaths during 5 years (per 1000 people with high cholesterol)
pravastatin 32
placebo 41
Right off, it doesn’t look as good as you might have thought. Overall, death from a heart attack is a major killer, but if you take a thousand people and watch them for five years, not that many people die from a heart attack. Now there are three standard ways of representing the data.
4. Data presentation – Relative risk reduction.
Risk is the number of cases divided by total number of people in the trial (or risk per total number). So you calculate a risk for 1000 people on the drug = 32/1000 = 03.2 % and similarly for people on the statin. Risk reduction for comparing treatments is\ the difference between the two risks. The relative risk reduction here is just the reduction in risk divided by the risk for the placebo:
Risk reduction (number of people saved per thousand) = 41-32 = 9. Saving 9 lives doesn’t sound that great but lets get the per cent as reported.
Relative risk reduction = 9/41 = 22 % as indicated, and it does sound like a big deal but there are other ways to look at the data.
5. Data presentation – Absolute risk reduction. Again, you start with risk, the number of cases divided by total number but you calculate the actual fraction. The absolute risk reduction is the difference between these two fractions.
For pravastatin, risk = 32/1000
For placebo, risk = 41/1000
Absolute risk reduction = (41/1000) – (32/1000) = 9/1000 = 0.9 % (less than 1 %)
6. Data presentation – Number needed to treat (NNT): This is a good indicator of outcomes. If you treat 1000 people, 9 will survive who might have otherwise died. So,
number that you have to treat to save one life = NNT = 1000/9 = 111 people .
7. Conclusion: 22 % risk reduction is true enough but it seems like it didn’t really tell you what you want to know. Cutting to the chase, would you take a statin if you had high cholesterol (more than about 250 mg/dl) and, as in WOSCOPS, no history of heart attacks. On the basis of this study alone, it’s not clear. First, the risk is low. There is clearly a benefit but how predictable is that benefit? In the study, 99 % of the people had no benefit. Of course, if you are the one out of a hundred, the drug would be a good thing. The question is not easy to answer but the point of what’s written here is that the statistics as reported in the media might have led you to jump to conclusions. Before you jump, though, you might ask about side-effects. This is a complicated subject because although the side-effects are rare, their incidence is not zero and they can be severe but this post is only about the statistics.