Asher Peres was a physicist, an expert in information theory who died in 2005 and was remembered for his scientific contributions as well as for his iconoclastic wit and ironic aphorisms. One of his witticisms was that “unperformed research has no results ” Peres had undoubtedly never heard of intention-to-treat (ITT), the strange statistical method that has appeared recently, primarily in the medical literature. According to ITT, the data from a subject assigned at random to an experimental group must be included in the reported outcome data for that group even if the subject does not follow the protocol, or even if they drop out of the experiment. At first hearing, the idea is counter-intuitive if not completely idiotic – why would you include people who are not in the experiment in your data? – suggesting that a substantial burden of proof rests with those who want to employ it. No such obligation is usually met and particularly in nutrition studies, such as comparisons of isocaloric weight loss diets, ITT is frequently used with no justification and sometimes demanded by reviewers. Not surprisingly, there is a good deal of controversy on this subject. Physiologists or chemists, hearing this description usually walk away shaking their head or immediately come up with one or another obvious reductio ad absurdum, e.g. “You mean, if nobody takes the pill, you report whether or not they got better anyway?” That’s exactly what it means.
On the naive assumption that some people really didn’t understand what was wrong with ITT — I’ve been known to make a few elementary mistakes in my life — I wrote a paper on the subject. It received negative, actually hostile. reviews from two public health journals — I include an amusing example at the end of this post. I even got substantial grief from Nutrition & Metabolism, where I was the editor at the time, but where it was finally published. The current post will be based on that paper and I will provide a couple of interesting cases from the medical literature. In the next post I will discuss a quite remarkable new instance — Foster’s two year study of low carbohydrate diets — of the abuse of common sense that is the major alternative to ITT.
To put a moderate spin on the problem, there is nothing wrong with ITT, if you explicitly say what the method shows — the effect of assigning subjects to an experimental protocol; the title of my paper was Intention-to-treat. What is the question? If you are very circumspect about that question, then there is little problem. It is common, however, for the Abstract of a paper to correctly state that patients “were assigned to a diet” but by the time the Results are presented, the independent variable has become, not “assignment to the diet,” but “the diet” which most people would assume meant what people ate, rather than what they were told to eat. Caveat lector. My paper was a kind of over-kill and I made several different arguments but the common sense argument gets to the heart of the problem in a practical way. I’ll describe that argument and also give a couple of real examples.
Common sense argument against intention-to-treat
Consider an experimental comparison of two diets in which there is a simple, discrete outcome, e.g. a threshold amount of weight lost or remission of an identifiable symptom. Patients are randomly assigned to two different diets: diet group A or diet group B and a target of, say, 5 kg weight loss is considered success. As shown in the table above, in group A, half of the subject are able to stay on the diet but, for whatever reason, half are not. The half of the patients in group A who did stay on the diet, however, were all able to lose the target 5 kg. In group B, on the other hand, everybody is able to stay on the diet but only half are able to lose the required amount of weight. An ITT analysis shows no difference in the two outcomes, while just looking at those people who followed the diet shows 100 % success. This is one of the characteristics of ITT: it always makes the better diet look worse than it is.
Diet A | Diet B | |
Compliance (of 100 patients) | 50 | 100 |
Success (reached target) | 50 | 50 |
ITT success | 50/100 = 50% | 50/100 = 50% |
“per protocol” (followed diet) success | 50/50 = 100% | 50/100 = 50% |
Now, you are the doctor. With such data in hand should you advise a patient: “well, the diets are pretty much the same. It’s largely up to you which you choose,” or, looking at the raw data (both compliance and success), should the recommendation be: “Diet A is much more effective than diet B but people have trouble staying on it. If you can stay on diet A, it will be much better for you so I would encourage you to see if you could find a way to do so.” Which makes more sense? You’re the doctor.
I made several arguments trying to explain that there are two factors, only one of which (whether it works) is clearly due to the diet. The other (whether you follow the diet) is under control of other factors (whether WebMD tells you that one diet or the other will kill you, whether the evening news makes you lose your appetite, etc.) I even dragged in a geometric argument because Newton had used one in the Principia: “a 2-dimensional outcome space where the length of a vector tells how every subject did…. ITT represents a projection of the vector onto one axis, in other words collapses a two dimensional vector to a one-dimensional vector, thereby losing part of the information.” Pretentious? Moi?
Why you should care. Case I. Surgery or Medicine?
Does your doctor actually read these academic studies using ITT? One can only hope not. Consider the analysis by Newell of the Coronary Artery Bypass Surgery (CABS) trial. This paper is astounding for its blanket, tendentious insistence on what is correct without any logical argument. Newell considers that the method of
“the CABS research team was impeccable. They refused to do an ‘as treated’ analysis: ‘We have refrained from comparing all patients actually operated on with all not operated on: this does not provide a measure of the value of surgery.”
Translation: results of surgery do not provide a measure of the value of surgery. So, in the CABS trial, patients were assigned to Medicine or Surgery. The actual method used and the outcomes are shown in the Table below. Intention-to-treat analysis was, as described by Newell, “used, correctly.” Looking at the table, you can see that a 7.8% mortality was found in those assigned to receive medical treatment (29 people out of 373 died), and a 5.3% mortality (21 deaths out of 371) for assignment to surgery. If you look at the outcomes of each modality as actually used, it turns out that that medical treatment had a 9.5% (33/349) mortality rate compared with 4.1% (17/419) for surgery, an analysis that Newell says “would have wildly exaggerated the apparent value of surgery.”
Survivors and deaths after allocation to surgery or medical treatment | ||||
Allocated medicine | Allocated surgery | |||
Received surgery | Received medicine | Received surgery | Received medicine | |
Survived 2 years | 48 | 296 | 354 | 20 |
Died | 2 | 27 | 15 | 6 |
Total | 50 | 323 | 369 | 26 |
Common sense suggests that appearances are not deceiving. If you were one of the 33-17 = 16 people who were still alive, you would think that it was the potential report of your death that had been exaggerated. The thing that is under the control of the patient and the physician, and which is not a feature of the particular modality, is getting the surgery implemented. Common sense dictates that a patient is interested in surgery, not the effect of being told that surgery is good. The patient has a right to expect that if they comply, the physician would avoid conditions where, as stated by Hollis, “most types of deviations from protocol would continue to occur in routine practice.” The idea that “Intention to treat analysis is … most suitable for pragmatic trials of effectiveness rather than for explanatory investigations of efficacy” assumes that practical considerations are the same everywhere and that any practitioner is locked into the same abilities or lack of abilities as the original experimenter.
What is the take home message. One general piece of advice that I would give based on this discussion in the medical literature: don’t get sick.
Why you should care. Case II. The effect of vitamin E supplementation
A clear cut case of how off-the-mark ITT can be is a report on the value of antioxidant supplements. The Abstract of the paper concluded that “there were no overall effects of ascorbic acid, vitamin E, or beta carotene on cardiovascular events among women at high risk for CVD.” The study was based on an ITT analysis but,on the fourth page of the paper, it turns out that removing subjects due to
“noncompliance led to a significant 13% reduction in the combined end point of CVD morbidity and mortality… with a 22% reduction in MI …, a 27% reduction in stroke …. a 23% reduction in the combination of MI, stroke, or CVD death (RR (risk ratio), 0.77; 95% CI, 0.64–0.92 [P = 005]).”
The media universally reported the conclusion from the Abstract, namely that there was no effect of vitamin E. This conclusion is correct if you think that you can measure the effect of vitamin E without taking the pill out of the bottle. Does this mean that vitamin E is really of value? The data would certainly be accepted as valuable if the statistics were applied to a study of the value of replacing barbecued pork with whole grain cereal. Again, “no effect” was the answer to the question: “what happens if you are told to take vitamin E” but it still seems is reasonable that the effect of a vitamin means the effect of actually taking the vitamin.
The ITT controversy
Advocates of ITT see its principles as established and may dismiss a common sense approach as naïve. The issue is not easily resolved; statistics is not axiomatic: there is no F=ma, there is no zeroth law. A good statistics book will tell you in the Introduction that what we do in statistics is to try to find a way to quantify our intuitions. If this is not appreciated, and you do not go back to consideration of exactly what the question is that you are asking, it is easy to develop a dogmatic approach and insist on a particular statistic because it has become standard.
As I mentioned above, I had a good deal of trouble getting my original paper published and one anonymous reviewer said that “the arguments presented by the author may have applied, maybe, ten or fifteen years ago.” This criticism reminded me of Molière’s Doctor in Spite of Himself:
Sganarelle is disguised as a doctor and spouts medical double-talk with phony Latin, Greek and Hebrew to impress the client, Geronte, who is pretty dumb and mostly falls for it but:
Geronte: …there is only one thing that bothers me: the location of the liver and the heart. It seemed to me that you had them in the wrong place: the heart is on the left side but the liver is on the right side.
Sgnarelle: Yes. That used to be true but we have changed all that and medicine uses an entirely new approach.
Geronte: I didn’t know that and I beg your pardon for my ignorance.
In the end, it is reasonable that scientific knowledge be based on real observations. This has never before been thought to include data that was not actually in the experiment. I doubt that nous avons changé tout cela.
See Jaynes’ book “Probability Theory: The Logic of Science” for an axiomatic framework for scientific inference. Skilling and Sivia give a lighter-weight, more application-oriented approach.
I will check these out but stating my prejudice up front, most such analyses discuss the structure of science rather than the structure of scientific behavior which is not axiomatic and largely unknown for the same reason that many cognitive behaviors are not well understood: hard to quantify reinforcers like control. Like any behavioral analysis, a theory of scientific behavior would have to explain the maladaptive behavior of believing in ITT as well as when it works right. There is no way to logically convince anybody of a scientific law. That is why I have tried to develop a thread on science on the law. Common law is by definition not axiomatic and it may help in dealing with the nutrition mess, to see how evidence is handled in the courts. I’m afraid I agree with Steven Weinberg who echoed Justice Stewart’s take on pornography: we know it when we see it.
The reference to common law is interesting. Common law differs from science in that common law requires a decision to the case before it, as the most plausible decision based on incomplete knowledge. The common law develops in response to experience, endeavouring to express the experience coherently in concepts. Common law concepts thus evolve. The common law evolves much as a scientific theory, with precedents (narratives: situation, event, outcome) as data points in lieu of experiments (narratives: situation, event, outcome). It is noteworthy that Sir Francis Bacon was Chief Justice when he wrote Novum Organum.
Nutritional decisions are like common law in that I have to decide today what to eat based on incomplete scientific knowledge. I develop a common law of nutritional rules, that should evolve in response to experience and improved scientific knowledge. Where nutrition goes astray, in my view, is by becoming dogmatic and suffering confirmation bias, where law would generally adapt its concept set more readily in response to recalcitrant experience.
Thanks for your comments on this. The relations between courts of law and science is the subject of some earlier posts. The discussion is always about whether we focus on the application of either in its ideal form or in its breakdown. My original point was that in “evidence based medicine,” there is a breakdown which could be enlightened by how evidence is treated in court, there is a judge who determines admissibility whereas in medicine, proponents read the evidence as the see fit. I referred to Daubert which is the major case in scientific evidence in law. Originally intended to go beyond simple reliance on general acceptance or expert opinion, tried to set up flexible criteria which is more suitable to scientific standards. In the real world of legal practice, however, Daubert has not played out well, frequently putting undue pressure on plaintiffs in a toxic court case, who might not have the resources to meet what are now seen as rigid standards. Because the law is not axiomatic, it can rely on common sense which is what has been discarded in intention-to-treat. Also, peer review is supposed to supply some of the safe-guards that are contained in cross-examination but peer-review is so badly damaged as to be irrelevant. Can you imagine what Sam Waterston would do to an expert who claimed that you have to include data on vitamin E from people who lost the bottle of vitamins?
The common law of evidence develops in the same manner as other areas of common law and should adapt with experience. A fundamental principle of evidence law is the disciplining effect of the opportunity for vigorous cross-examination to reveal undisclosed assumptions, latent ambiguities and biases. This enables the decision-maker to assess and gauge the relative credibility and importance of testimony. As you observe, this seems a bit lacking in the initial production and publication of medical research. It seems continually that much more is claimed in reports of research than is demonstrated. The peer review system does not seem to discipline this all that well and lessons could be learned from law.
Common law also imposes a system of onuses, which for policy reasons puts the initial burden of proof on one side or the other. A burden to prove causation to a scientific standard on a tort plaintiff promotes a policy of liberty of action (for defendants), whereas an assumption of causation from more general indications (res ipsa loquitor) promotes access to courts. I think it is fair to say that here in Canada, where we have universal medical care provided by the state, there is less pressure to use the law of evidence to promote compensation for injury through the tort system. The adoption of stricter scientific evidence requirements in toxic tort claims may reflect a policy shift toward business, or might just be a court being over-enthused to appear scientific. Cases that are “out there” tend to get pared back soon enough.
On second thought, although I haven’t looked at it in a while, Bayesian statistics is fundamentally behavioral in nature, right? Scientific theories are “reinforced” as evidence is accumulated. So Bayesian statistics may be similar to or could be made consistent with behavioral psychology. No? Has anybody ever studied that ?
Thanks for bring this topic up again and again until the necessary changes are made.
http://exceptionallybrash.blogspot.com/2011/08/it-is-what-it-is.html
This is a field that is run by physicians who can’t say to a patient “I don’t know,” or at least, cannot avoid acting on what they do know, that is, on what it might be rather than, as you say, “it is what it is.” This is why physicians are admired (most of the time) because few of us can shoulder that responsibility. Unfortunately, that’s not how science is done. If you don’t know, you don’t know. It is what it is, not what it might be (or what your intention is for it to be).
[…] What it is and why you should care. August 28, 2011By: rdfeinman Read the Full Post at: Richard David Feinman Asher Peres was a physicist, an expert in information theory who died in 2005 and was remembered […]
[…] so-called “per protocol” group. This is what was done in the Vitamin E study described in the last post on this subject. This data is missing from Foster’s paper. Given the high attrition rate, this suggests that […]