Asher Peres was a physicist, an expert in information theory who died in 2005 and was remembered for his scientific contributions as well as for his iconoclastic wit and ironic aphorisms. One of his witticisms was that “unperformed research has no results ” Peres had undoubtedly never heard of intention-to-treat (ITT), the strange statistical method that has appeared recently, primarily in the medical literature. According to ITT, the data from a subject assigned at random to an experimental group must be included in the reported outcome data for that group even if the subject does not follow the protocol, or even if they drop out of the experiment. At first hearing, the idea is counter-intuitive if not completely idiotic – why would you include people who are not in the experiment in your data? – suggesting that a substantial burden of proof rests with those who want to employ it. No such obligation is usually met and particularly in nutrition studies, such as comparisons of isocaloric weight loss diets, ITT is frequently used with no justification and sometimes demanded by reviewers. Not surprisingly, there is a good deal of controversy on this subject. Physiologists or chemists, hearing this description usually walk away shaking their head or immediately come up with one or another obvious reductio ad absurdum, e.g. “You mean, if nobody takes the pill, you report whether or not they got better anyway?” That’s exactly what it means.
On the naive assumption that some people really didn’t understand what was wrong with ITT — I’ve been known to make a few elementary mistakes in my life — I wrote a paper on the subject. It received negative, actually hostile. reviews from two public health journals — I include an amusing example at the end of this post. I even got substantial grief from Nutrition & Metabolism, where I was the editor at the time, but where it was finally published. The current post will be based on that paper and I will provide a couple of interesting cases from the medical literature. In the next post I will discuss a quite remarkable new instance — Foster’s two year study of low carbohydrate diets — of the abuse of common sense that is the major alternative to ITT.
To put a moderate spin on the problem, there is nothing wrong with ITT, if you explicitly say what the method shows — the effect of assigning subjects to an experimental protocol; the title of my paper was Intention-to-treat. What is the question? If you are very circumspect about that question, then there is little problem. It is common, however, for the Abstract of a paper to correctly state that patients “were assigned to a diet” but by the time the Results are presented, the independent variable has become, not “assignment to the diet,” but “the diet” which most people would assume meant what people ate, rather than what they were told to eat. Caveat lector. My paper was a kind of over-kill and I made several different arguments but the common sense argument gets to the heart of the problem in a practical way. I’ll describe that argument and also give a couple of real examples.
Common sense argument against intention-to-treat
Consider an experimental comparison of two diets in which there is a simple, discrete outcome, e.g. a threshold amount of weight lost or remission of an identifiable symptom. Patients are randomly assigned to two different diets: diet group A or diet group B and a target of, say, 5 kg weight loss is considered success. As shown in the table above, in group A, half of the subject are able to stay on the diet but, for whatever reason, half are not. The half of the patients in group A who did stay on the diet, however, were all able to lose the target 5 kg. In group B, on the other hand, everybody is able to stay on the diet but only half are able to lose the required amount of weight. An ITT analysis shows no difference in the two outcomes, while just looking at those people who followed the diet shows 100 % success. This is one of the characteristics of ITT: it always makes the better diet look worse than it is.
|Diet A||Diet B|
|Compliance (of 100 patients)||50||100|
|Success (reached target)||50||50|
|ITT success||50/100 = 50%||50/100 = 50%|
|“per protocol” (followed diet) success||50/50 = 100%||50/100 = 50%|
Now, you are the doctor. With such data in hand should you advise a patient: “well, the diets are pretty much the same. It’s largely up to you which you choose,” or, looking at the raw data (both compliance and success), should the recommendation be: “Diet A is much more effective than diet B but people have trouble staying on it. If you can stay on diet A, it will be much better for you so I would encourage you to see if you could find a way to do so.” Which makes more sense? You’re the doctor.
I made several arguments trying to explain that there are two factors, only one of which (whether it works) is clearly due to the diet. The other (whether you follow the diet) is under control of other factors (whether WebMD tells you that one diet or the other will kill you, whether the evening news makes you lose your appetite, etc.) I even dragged in a geometric argument because Newton had used one in the Principia: “a 2-dimensional outcome space where the length of a vector tells how every subject did…. ITT represents a projection of the vector onto one axis, in other words collapses a two dimensional vector to a one-dimensional vector, thereby losing part of the information.” Pretentious? Moi?
Why you should care. Case I. Surgery or Medicine?
Does your doctor actually read these academic studies using ITT? One can only hope not. Consider the analysis by Newell of the Coronary Artery Bypass Surgery (CABS) trial. This paper is astounding for its blanket, tendentious insistence on what is correct without any logical argument. Newell considers that the method of
“the CABS research team was impeccable. They refused to do an ‘as treated’ analysis: ‘We have refrained from comparing all patients actually operated on with all not operated on: this does not provide a measure of the value of surgery.”
Translation: results of surgery do not provide a measure of the value of surgery. So, in the CABS trial, patients were assigned to Medicine or Surgery. The actual method used and the outcomes are shown in the Table below. Intention-to-treat analysis was, as described by Newell, “used, correctly.” Looking at the table, you can see that a 7.8% mortality was found in those assigned to receive medical treatment (29 people out of 373 died), and a 5.3% mortality (21 deaths out of 371) for assignment to surgery. If you look at the outcomes of each modality as actually used, it turns out that that medical treatment had a 9.5% (33/349) mortality rate compared with 4.1% (17/419) for surgery, an analysis that Newell says “would have wildly exaggerated the apparent value of surgery.”
|Survivors and deaths after allocation to surgery or medical treatment|
|Allocated medicine||Allocated surgery|
|Received surgery||Received medicine||Received surgery||Received medicine|
|Survived 2 years||48||296||354||20|
Common sense suggests that appearances are not deceiving. If you were one of the 33-17 = 16 people who were still alive, you would think that it was the potential report of your death that had been exaggerated. The thing that is under the control of the patient and the physician, and which is not a feature of the particular modality, is getting the surgery implemented. Common sense dictates that a patient is interested in surgery, not the effect of being told that surgery is good. The patient has a right to expect that if they comply, the physician would avoid conditions where, as stated by Hollis, “most types of deviations from protocol would continue to occur in routine practice.” The idea that “Intention to treat analysis is … most suitable for pragmatic trials of effectiveness rather than for explanatory investigations of efficacy” assumes that practical considerations are the same everywhere and that any practitioner is locked into the same abilities or lack of abilities as the original experimenter.
What is the take home message. One general piece of advice that I would give based on this discussion in the medical literature: don’t get sick.
Why you should care. Case II. The effect of vitamin E supplementation
A clear cut case of how off-the-mark ITT can be is a report on the value of antioxidant supplements. The Abstract of the paper concluded that “there were no overall effects of ascorbic acid, vitamin E, or beta carotene on cardiovascular events among women at high risk for CVD.” The study was based on an ITT analysis but,on the fourth page of the paper, it turns out that removing subjects due to
“noncompliance led to a significant 13% reduction in the combined end point of CVD morbidity and mortality… with a 22% reduction in MI …, a 27% reduction in stroke …. a 23% reduction in the combination of MI, stroke, or CVD death (RR (risk ratio), 0.77; 95% CI, 0.64–0.92 [P = 005]).”
The media universally reported the conclusion from the Abstract, namely that there was no effect of vitamin E. This conclusion is correct if you think that you can measure the effect of vitamin E without taking the pill out of the bottle. Does this mean that vitamin E is really of value? The data would certainly be accepted as valuable if the statistics were applied to a study of the value of replacing barbecued pork with whole grain cereal. Again, “no effect” was the answer to the question: “what happens if you are told to take vitamin E” but it still seems is reasonable that the effect of a vitamin means the effect of actually taking the vitamin.
The ITT controversy
Advocates of ITT see its principles as established and may dismiss a common sense approach as naïve. The issue is not easily resolved; statistics is not axiomatic: there is no F=ma, there is no zeroth law. A good statistics book will tell you in the Introduction that what we do in statistics is to try to find a way to quantify our intuitions. If this is not appreciated, and you do not go back to consideration of exactly what the question is that you are asking, it is easy to develop a dogmatic approach and insist on a particular statistic because it has become standard.
As I mentioned above, I had a good deal of trouble getting my original paper published and one anonymous reviewer said that “the arguments presented by the author may have applied, maybe, ten or fifteen years ago.” This criticism reminded me of Molière’s Doctor in Spite of Himself:
Sganarelle is disguised as a doctor and spouts medical double-talk with phony Latin, Greek and Hebrew to impress the client, Geronte, who is pretty dumb and mostly falls for it but:
Geronte: …there is only one thing that bothers me: the location of the liver and the heart. It seemed to me that you had them in the wrong place: the heart is on the left side but the liver is on the right side.
Sgnarelle: Yes. That used to be true but we have changed all that and medicine uses an entirely new approach.
Geronte: I didn’t know that and I beg your pardon for my ignorance.
In the end, it is reasonable that scientific knowledge be based on real observations. This has never before been thought to include data that was not actually in the experiment. I doubt that nous avons changé tout cela.