Posts Tagged ‘epidemiology’

The attack was quite sudden although it appeared to have been planned for many years. The paper was published last week (Augustin LS, Kendall CW, Jenkins DJ, Willett WC, Astrup A, Barclay AW, Bjorck I, Brand-Miller JC, Brighenti F, Buyken AE et al: Glycemic index, glycemic load and glycemic response: An International Scientific Consensus Summit from the International Carbohydrate Quality Consortium (ICQC). Nutr Metab Cardiovasc Dis 2015, 25(9):795-815.

Augustin_Stresa+Nov_27

As indicated by the title, responsibility was taken by the self-proclaimed ICQC.  It turned out to be a continuation of the long-standing attempt to use the glycemic index to co-opt the obvious benefits in control of the glucose-insulin axis while simultaneously attacking real low-carbohydrate diets. The authors participated in training in Stresa, Italy.

The operation was largely passive aggressive. While admitting the importance of dietary carbohydrate in controlling post-prandial glycemic,  low-carbohydrate diets were ignored. Well, not exactly. The authors actually had a strong attack.  The Abstract of the paper said (my emphasis):

Background and aims: The positive and negative health effects of dietary carbohydrates are of interest to both researchers and consumers.”

Methods: International experts on carbohydrate research held a scientific summit in Stresa, Italy, in June 2013 to discuss controversies surrounding the utility of the glycemic index (GI), glycemic load (GL) and glycemic response (GR).”

So, for the record, the paper is about dietary carbohydrate and about controversies.

The Results in Augustin, et al were simply

“The outcome was a scientific consensus statement which recognized the importance of postprandial glycemia in overall health, and the GI as a valid and reproducible method of classifying carbohydrate foods for this purpose…. Diets of low GI and GL were considered particularly important in individuals with insulin resistance.”

A definition is always a reproducible way of classifying things, and the conclusion is not controversial: glycemia is important.  Low-GI diets are a weak form of low-carbohydrate diet and they are frequently described as a politically correct form of carbohydrate restriction. It is at least a subset of carbohydrate restriction and one of the “controversies” cited in the Abstract is sensibly whether it is better or worse than total carbohydrate restriction. Astoundingly, this part of the controversy was ignored by the authors.  Our recent review of carbohydrate restriction in diabetes had this comparison:

 

 

15_Th_Westman_Jenkins_Mar25-2

A question of research integrity.

It is considered normal scientific protocol that, in a scientific field, especially one that is controversial, that you consider and cite alternative or competing points of view. So how do the authors see low-carbohydrate diets fitting in? If you search the pdf of Augustin, et al on “low-carbohydrate” or “low carbohydrate,” there are only two in the text:

“Very low carbohydrate-high protein diets also have beneficial effects on weight control and some cardiovascular risk factors (not LDL-cholesterol) in the short term, but are associated with increased mortality in long term cohort studies [156],”

and

“The lowest level of postprandial glycemia is achieved using very low carbohydrate-high protein diets, but these cannot be recommended for long term use.”

There are no references for the second statement but very low carbohydrate diets can be and frequently are recommended for long term use and have good results. I am not aware of “increased mortality in long term cohort studies” as in the first statement. In fact, low-carbohydrate diets are frequently criticized for not being subjected to long-term studies. So it was important to check out the studie(s) in reference 156:

[156] Pagona L, Sven S, Marie L, Dimitrios T, Hans-Olov A, Elisabete W. Low carbohydrate-high protein diet and incidence of cardiovascular diseases in Swedish women: prospective cohort study. BMJ 2012;344.

Documenting increased mortality.

The paper is not about mortality but rather about cardiovascular disease and, oddly, the authors are listed by their first names. (Actual reference: Lagiou P, Sandin S, Lof M, Trichopoulos D, Adami HO, Weiderpass E: . BMJ 2012, 344:e4026). This minor error probably reflects the close-knit “old boys” circle that functions on a first name basis although it may also indicate that the reference was not actually read so it was not discovered what the reference was really about.

Anyway, even though it is about cardiovascular disease, it is worth checking out. Who wants increased risk of anything. So what does Lagiou, et al say?

The Abstract of Lagiou says (my emphasis) “Main outcome measures: Association of incident cardiovascular diseases … with decreasing carbohydrate intake (in tenths), increasing protein intake (in tenths), and an additive combination of these variables (low carbohydrate-high protein score, from 2 to 20), adjusted for intake of energy, intake of saturated and unsaturated fat, and several non-dietary variables.”

Low-carbohydrate score? There were no low-carbohydrate diets. There were no diets at all. This was an analysis of “43, 396 Swedish women, aged 30-49 years at baseline, [who] completed an extensive dietary questionnaire and were followed-up for an average of 15.7 years.” The outcome variable, however, was only the “score” which the authors made up and which, as you might guess, was not seen and certainly not approved, by anybody with actual experience with low-carbohydrate diets. And, it turns out that “Among the women studied, carbohydrate intake at the low extreme of the distribution was higher and protein intake at the high extreme of the distribution was lower than the respective intakes prescribed by many weight control diets.” (In social media, this is called “face-palm”).

Whatever the method, though, I wanted to know how bad it was? The 12 years or so that I have been continuously on a low-carbohydrate diet might be considered pretty long term. What is my risk of CVD?

Results: A one tenth decrease in carbohydrate intake or increase in protein intake or a 2 unit increase in the low carbohydrate-high protein score were all statistically significantly associated with increasing incidence of cardiovascular disease overall (n=1270)—incidence rate ratio estimates 1.04 (95% confidence interval 1.00 to 1.08), 1.04 (1.02 to 1.06), and 1.05 (1.02 to 1.08).”

Rate ratio 1.04? And that’s an estimate.  That’s odds of 51:49.  That’s what I am supposed to be worried about. But that’s the relative risk. What about the absolute risk? There were 43 396 women in the study with 1270 incidents, or 2.9 % incidence overall.  So the absolute difference is about 1.48-1.42% = 0.06 % or less than 1/10 of 1 %.

Can such low numbers be meaningful? The usual answers is that if we scale them up to the whole population, we will save thousands of lives. Can we do that? Well, you can if the data are strong, that is, if we are really sure of the reliability of the independent variable. The relative risk in the Salk vaccine polio trial, for example, was in this ballpark but scaling up obviously paid off. In the Salk vaccine trial, however, we knew who got the vaccine and who didn’t. In distinction, food questionnaire’s have a bad reputation. Here is Lagiou’s description (you don’t really have to read this):

“We estimated the energy adjusted intakes of protein and carbohydrates for each woman, using the ‘residual method.’ This method allows evaluation of the “effect” of an energy generating nutrient, controlling for the energy generated by this nutrient, by using a simple regression of that nutrient on energy intake.…” and so on. I am not sure what it means but it certainly sounds like an estimate. So is the data itself any good? Well,

“After controlling for energy intake, however, distinguishing the effects of a specific energy generating nutrient is all but impossible, as a decrease in the intake of one is unavoidably linked to an increase in the intake of one or several of the others. Nevertheless, in this context, a low carbohydrate-high protein score allows the assessment of most low carbohydrate diets, which are generally high protein diets, because it integrates opposite changes of two nutrients with equivalent energy values.”

And “The long interval between exposure and outcome is a source of concern, because certain participants may change their dietary habits during the intervening period.”

Translation: we don’t really know what we did here.

In the end, Lagiou, et al admit “Our results do not answer questions concerning possible beneficial short term effects of low carbohydrate or high protein diets in the control of body weight or insulin resistance. Instead, they draw attention to the potential for considerable adverse effects on cardiovascular health of these diets….” Instead? I thought insulin resistance has an effect on CVD but if less than 1/10 of 1 % is “considerable adverse effects” what would something “almost zero” be.?

Coming back to the original paper by Augustin, et al, what about the comparison between low-GI diets and low-carbohydrate diets. The comparison in the figure above comes from Eric Westman’s lab. What do they have to say about that?

Augustin_

They missed this paper. Note: a comment I received suggested that I should have searched on “Eric” instead of “Westman.” Ha.

Overall, this is the evidence used by ICQC to tell you that low-carbohydrate diets would kill you. In the end, Augustin, et al is a hatchet-job, citing a meaningless paper at random. It is hard to understand why the journal took it. I will ask the editors to retract it.

“…789 deaths were reported in Doll and Hill’s original cohort. Thirty-six of these were attributed to lung cancer. When these lung cancer deaths were counted in smokers versus non-smokers, the correlation virtually sprang out: all thirty-six of the deaths had occurred in smokers. The difference between the two groups was so significant that Doll and Hill did not even need to apply complex statistical metrics to discern it. The trial designed to bring the most rigorous statistical analysis to the cause of lung cancer barely required elementary mathematics to prove his point.”

Siddhartha Mukherjee —The Emperor of All Maladies.

 Scientists don’t like philosophy of science. It is not just that pompous phrases like hypothetico-deductive systems are such a turn-off but that we rarely recognize it as what we actually do. In the end, there is no definition of science and it is hard to generalize about actual scientific behavior. It’s a human activity and precisely because it puts a premium on creativity, it defies categorization. As the physicist Steven Weinberg put it, echoing Justice Stewart on pornography:

“There is no logical formula that establishes a sharp dividing line between a beautiful explanatory theory and a mere list of data, but we know the difference when we see it — we demand a simplicity and rigidity in our principles before we are willing to take them seriously [1].”

A frequently stated principle is that “observational studies only generate hypotheses.” The related idea that “association does not imply causality” is also common, usually cited by those authors who want you to believe that the association that they found does imply causality. These ideas are not right or, at least, they insufficiently recognize that scientific experiments are not so easily wedged into categories like “observational studies.”  The principles are also invoked by bloggers and critics to discredit the continuing stream of observational studies that make an association between their favorite targets, eggs, red meat, sugar-sweetened soda and a metabolic disease or cancer. In most cases, the studies are getting what they deserve but the bills of indictment are not quite right.  It is usually not simply that they are observational studies but rather that they are bad observational studies and, in any case, the associations are so weak that it is reasonable to say that they are an argument for a lack of causality. On the assumption that good experimental practice and interpretation can be even roughly defined, let me offer principles that I think are a better representation, insofar as we can make any generalization, of what actually goes on in science:

 Observations generate hypotheses. 

Observational studies test hypotheses.

Associations do not necessarily imply causality.

In some sense, all science is associations. 

Only mathematics is axiomatic.

 If you notice that kids who eat a lot of candy seem to be fat, or even if you notice that candy makes you yourself fat, that is an observation. From this observation, you might come up with the hypothesis that sugar causes obesity. A test of your hypothesis would be to see if there is an association between sugar consumption and incidence of obesity. There are various ways — the simplest epidemiologic approach is simply to compare the history of the eating behavior of individuals (insofar as you can get it) with how fat they are. When you do this comparison you are testing your hypothesis. There are an infinite number of things that you could have measured as an independent variable, meat, TV hours, distance from the French bakery but you have a hypothesis that it was candy. Mike Eades described falling asleep as a child by trying to think of everything in the world. You just can’t test them all. As Einstein put it “your theory determines the measurement you make.”

Associations predict causality. Hypotheses generate observational studies, not the other way around.

In fact, association can be strong evidence for causation and frequently provide support for, if not absolute proof, of the idea to be tested. A correct statement is that association does not necessarily imply causation. In some sense, all science is observation and association. Even thermodynamics, that most mathematical and absolute of sciences, rests on observation. As soon as somebody observes two systems in thermal equilibrium with a third but not with each other (zeroth law), the jig is up. When somebody builds a perpetual motion machine, that’s it. It’s all over.

Biological mechanisms, or perhaps any scientific theory, are never proved. By analogy with a court of law, you cannot be found innocent, only not guilty. That is why excluding a theory is stronger than showing consistency. The grand epidemiological study of macronutrient intake vs diabetes and obesity shows that increasing carbohydrate is associated with increased calories even under conditions where fruits and vegetables also went up and fat, if anything went down. It is an observational study but it is strong because it gives support to a lack of causal effect of increased carbohydrate and decreased fat on outcome. The failure of total or saturated fat to have any benefit is the kicker here. It is now clear that prospective experiments have, in the past, and will continue to show, the same negative outcome. Of course, in a court of law, if you are found not guilty of child abuse, people may still not let you move into their neighborhood. It is that saturated fat should never have been indicted in the first place.

An association will tell you about causality 1) if the association is strong and 2) if there is a plausible underlying mechanism and 3) if there is no more plausible explanation — for example, countries with a lot of TV sets have modern life styles that may predispose to cardiovascular disease; TV does not cause CVD.

Re-inventing the wheel. Bradford Hill and the history of epidemiology.

Everything written above is true enough or, at least, it seemed that way to me. I thought of it as an obvious description of what everybody knows. The change to saying that “association does not necessarily imply causation” is important but not that big a deal. It is common sense or logic and I had made it into a short list of principles. It was a blogpost of reasonable length. I described it to my colleague Gene Fine. His response was “aren’t you re-inventing the wheel?” Bradford Hill, he explained, pretty much the inventor of modern epidemiology, had already established these and a couple of other principles. Gene cited The Emperor of All Maladies, an outstanding book on the history of cancer.  I had read The Emperor of All Maladies on his recommendation and I remembered Bradford Hill and the description of the evolution of the ideas of epidemiology, population studies and random controlled trials. I also had a vague memory, of reading the story in James LeFanu’s The Rise and Fall of Modern Medicine, another captivating history of medicine. However, I had not really absorbed these as principles. Perhaps we’re just used to it, but saying that an association implies causality only if it is a strong association is not exactly a scientific breakthrough. It seems an obvious thing that you might say over coffee or in response to somebody’s blog. It all reminded me of learning, in grade school, that the Earl of Sandwich had invented the sandwich and thinking “this is an invention?”  Woody Allen thought the same thing and wrote the history of the sandwich and the Earl’s early failures — “In 1741, he places bread on bread with turkey on top. This fails. In 1745, he exhibits bread with turkey on either side. Everyone rejects this except David Hume.”

At any moment in history our background knowledge — and accepted methodology —  may be limited. Some problems seem to have simple solutions. But simple ideas are not always accepted. The concept of the random controlled trial (RCT), obvious to us now, was hard won and, proving that any particular environmental factor — diet, smoking, pollution or toxic chemicals was the cause of a disease and that, by reducing that factor, the disease could be prevented, turned out to be a very hard sell, especially to physicians whose view of disease may have been strongly colored by the idea of an infective agent.

Hill_CausationThe Rise and Fall of Modern Medicine describes Bradford Hill’s two demonstrations that streptomycin in combination with PAS (para-aminosalicylic acid) could cure tuberculosis and that tobacco causes lung cancer as one of the Ten Definitive Moments in the history of modern medicine (others shown in the textbox). Hill was Professor of Medical Statistics at the London School of Hygiene and Tropical Medicine but was not formally trained in statistics and, like many of us, thought of proper statistics as common sense. An early near fatal case of tuberculosis also prevented formal medical education. His first monumental accomplishment was, ironically, to demonstrate how tuberculosis could be cured with the combination of streptomycin and PAS.  In 1941, Hill and co-worker Richard Doll undertook a systematic investigation of the risk factors for lung cancer. His eventual success was accompanied by a description of the principles that allow you to say when association can be taken as causation.

 Ten Definitive Moments from Rise and Fall of Modern Medicine.

1941: Penicillin

1949: Cortisone

1950: streptomycin, smoking and Sir Austin Bradford Hill

1952: chlorpromazine and the revolution in psychiatry

1955: open-heart surgery – the last frontier

1963: transplanting kidneys

1964: the triumph of prevention – the case of strokes

1971: curing childhood cancer

1978: the first ‘Test-Tube’ baby

1984: Helicobacter – the cause of peptic ulcer

Wiki says: “in 1965, built  upon the work of Hume and Popper, Hill suggested several aspects of causality in medicine and biology…” but his approach was not formal — he never referred to his principles as criteria — he recognized them as common sense behavior and his 1965 presentation to the Royal Society of Medicine, is a remarkably sober, intelligent document. Although described as an example of an article that, as here, has been read more often in quotations and paraphrases, it is worth reading the original even today.

Note: “Austin Bradford Hill’s surname was Hill and he always used the name Hill, AB in publications. However, he is often referred to as Bradford Hill. To add to the confusion, his friends called him Tony.” (This comment is from Wikipedia, not Woody Allen).

The President’s Address

Bradford Hill’s description of the factors that might make you think an association implied causality:

Hill_Environment1965

1. Strength. “First upon my list I would put the strength of the association.” This, of course, is exactly what is missing in the continued epidemiological scare stories. Hill describes

“….prospective inquiries into smoking have shown that the death rate from cancer of the lung in cigarette smokers is nine to ten times the rate in non-smokers and the rate in heavy cigarette smokers is twenty to thirty times as great.”

But further:

“On the other hand the death rate from coronary thrombosis in smokers is no more than twice, possibly less, the death rate in nonsmokers. Though there is good evidence to support causation it is surely much easier in this case to think of some features of life that may go hand-in-hand with smoking – features that might conceivably be the real underlying cause or, at the least, an important contributor, whether it be lack of exercise, nature of diet or other factors.”

Doubts about an odds ratio of two or less. That’s where you really have to wonder about causality. The progression of epidemiologic studies that tell you red meat, HFCS, etc. will cause diabetes, prostatic cancer, or whatever, these rarely hit an odds ratio of 2.  While the published studies may contain disclaimers of the type in Hill’s paper, the PR department of the university where the work is done, and hence the public media, show no such hesitation and will quickly attribute causality to the study as if the odds ratio were 10 instead of 1.2.

2. Consistency: Hill listed the repetition of the results in other studies under different circumstances as a criterion for considering how much an association implied causality. Not mentioned but of great importance, is that this test cannot be made independent of the first criterion. Consistently weak associations do not generally add up to a strong association. If there is a single practice in modern medicine that is completely out of whack with respect to careful consideration of causality, it is the meta-analysis where studies with no strength at all are averaged so as to create a conclusion that is stronger than any of its components.

3. Specificity. Hill was circumspect on this point, recognizing that we should have an open mind on what causes what. On specificity of cancer and cigarettes, Hill noted that the two sites in which he showed a cause and effect relationship were the lungs and the nose.

4. Temporality: Obviously, we expect the cause to precede the effect or, as some wit put it “which got laid first, the chicken or the egg.”  Hill recognized that it was not so clear for diseases that developed slowly. “Does a particular diet lead to disease or do the early stages of the disease lead to those peculiar dietetic habits?” Of current interest are the epidemiologic studies that show a correlation between diet soda and obesity which are quick to see a causal link but, naturally, one should ask “Who drinks diet soda?”

5. Biological gradient:  the association should show a dose response curve. In the case of cigarettes, the death rate from cancer of the lung increases linearly with the number of cigarettes smoked. A subset of the first principle, that the association should be strong, is that the dose-response curve should have a meaningful slope and, I would add, the numbers should be big.

6. Plausibilityy: On the one hand, this seems critical — the association of egg consumption with diabetes is obviously foolish — but the hypothesis to be tested may have come from an intuition that is far from evident. Hill said, “What is biologically plausible depends upon the biological knowledge of the day.”

7. Coherence: “data should not seriously conflict with the generally known facts of the natural history and biology of the disease”

8. Experiment: It was another age. It is hard to believe that it was in my lifetime. “Occasionally it is possible to appeal to experimental, or semi-experimental, evidence. For example, because of an observed association some preventive action is taken. Does it in fact prevent?” The inventor of the random controlled trial would be amazed how many of these are done, how many fail to prevent. And, most of all, he would have been astounded that it doesn’t seem to matter. However, the progression of failures, from Framingham to the Women’s Health Initiative, the lack of association between low fat, low saturated fat and cardiovascular disease, is strong evidence for the absence of causation.

9. Analogy: “In some circumstances it would be fair to judge by analogy. With the effects of thalidomide and rubella before us we would surely be ready to accept slighter but similar evidence with another drug or another viral disease in pregnancy.”

Hill’s final word on what has come to be known as his criteria for deciding about causation:

“Here then are nine different viewpoints from all of which we should study association before we cry causation. What I do not believe — and this has been suggested — is that we can usefully lay down some hard-and-fast rules of evidence that must be obeyed before we accept cause and effect. None of my nine viewpoints can bring indisputable evidence for or against the cause-and-effect hypothesis and none can be required as a sine qua non. What they can do, with greater or less strength, is to help us to make up our minds on the fundamental question – is there any other way of explaining the set of facts before us, is there any other answer equally, or more, likely than cause and effect?” This may be the first critique of the still-to-be-invented Evidence-based Medicine.

Nutritional Epidemiology.

The decision to say that an observational study implies causation is equivalent to an assertion that the results are meaningful, that it is not a random association at all, that it is scientifically sound. Critics of epidemiological studies have relied on their own perceptions and appeal to common sense and when I started this blogpost, I was one of them, and I had not appreciated the importance of Bradford Hill’s principles. The Emperor of All Maladies described Hill’s strategies for dealing with association and causation “which have remained in use by epidemiologists to date.”  But have they? The principles are in the texts. Epidemiology, Biostatistics, and Preventive Medicine has a chapter called “The study of causation in Epidemiologic Investigation and Research” from which the dose-response curve was modified. Are these principles being followed? Previous posts in this blog and others have have voiced criticisms of epidemiology as it’s currently practiced in nutrition but we were lacking a meaningful reference point. Looking back now, what we see is a large number of research groups doing epidemiology in violation of most of Hill’s criteria.

The red meat scare of 2011 was Pan, et al and I described in a previous post, the remarkable blog from Harvard . Their blog explained that the paper was unnecessarily scary because it had described things in terms of “relative risks, comparing death rates in the group eating the least meat with those eating the most. The absolute risks… sometimes help tell the story a bit more clearly. These numbers are somewhat less scary.”  I felt it was appropriate to ask “Why does Dr. Pan not want to tell the story as clearly as possible?  Isn’t that what you’re supposed to do in science? Why would you want to make it scary?” It was, of course, a rhetorical question.

Looking at Pan, et al. in light of Bradford Hill, we can examine some of their data. Figure 2 from their paper shows the risk of diabetes as a function of red meat in the diet. The variable reported is the hazard ratio which can be considered roughly the same as the odds ratio, that is, relative odds of getting diabetes. I have indicated, in pink, those values that are not statistically significant and I grayed out the confidence interval to make it easy to see that these do not even hit the level of 2 that Bradford Hill saw as some kind of cut-off.

TheBlog_Cause_Pan_Fig2_

The hazard ratios for processed meat are somewhat higher but still less than 2. This is weak data and violates the first and most important of Hill’s criteria. As you go from quartile 2 to 3, there is an increase in risk, but at Q4, the risk goes down and then back up at Q5, in distinction to principle 5 which suggests the importance of dose-response curves. But, stepping back and asking what the whole idea is, asking why you would think that meat has a major — and isolatable role separate from everything else — in a disease of carbohydrate intolerance, you see that this is not rational, this is not science. And Pan is not making random observations. This is a test of the hypothesis that red meat causes diabetes. Most of us would say that it didn’t make any sense to test such a hypothesis but the results do not support the hypothesis.

What is science?

Science is a human activity and what we don’t like about philosophy of science is that it is about the structure and formalism of science rather than what scientists really do and so there aren’t even any real definitions. One description that I like, from a colleague at the NIH: “What you do in science, is you make a hypothesis and then you try to shoot yourself down.” One of the more interesting sidelights on the work of Hill and Doll, as described in Emperor, was that during breaks from the taxing work of analyzing the questionnaires that provided the background on smoking, Doll himself would step out for a smoke. Doll believed that cigarettes were unlikely to be a cause — he favored tar from paved highways as the causative agent — but as the data came in, “in the middle of the survey, sufficiently alarmed, he gave up smoking.” In science, you try to shoot yourself down and, in the end, you go with the data.

Asher Peres was a physicist, an expert in information theory who died in 2005 and was remembered for his scientific contributions as well as for his iconoclastic wit and ironic aphorisms. One of his witticisms was that “unperformed research has no results ”  Peres had undoubtedly never heard of intention-to-treat (ITT), the strange statistical method that has appeared recently, primarily in the medical literature.  According to ITT, the data from a subject assigned at random to an experimental group must be included in the reported outcome data for that group even if the subject does not follow the protocol, or even if they drop out of the experiment.  At first hearing, the idea is counter-intuitive if not completely idiotic  – why would you include people who are not in the experiment in your data? – suggesting that a substantial burden of proof rests with those who want to employ it.  No such obligation is usually met and particularly in nutrition studies, such as comparisons of isocaloric weight loss diets, ITT is frequently used with no justification and sometimes demanded by reviewers.   Not surprisingly, there is a good deal of controversy on this subject.  Physiologists or chemists, hearing this description usually walk away shaking their head or immediately come up with one or another obvious reductio ad absurdum, e.g. “You mean, if nobody takes the pill, you report whether or not they got better anyway?” That’s exactly what it means.

On the naive assumption that some people really didn’t understand what was wrong with ITT — I’ve been known to make a few elementary mistakes in my life — I wrote a paper on the subject.  It received negative, actually hostile. reviews from two public health journals — I include an amusing example at the end of this post.  I even got substantial grief from Nutrition & Metabolism, where I was the editor at the time, but where it was finally published.  The current post will be based on that paper and I will provide a couple of interesting cases from the medical literature.  In the next post I will discuss a quite remarkable new instance — Foster’s two year study of low carbohydrate diets — of the abuse of common sense that is the major alternative to ITT.

To put a moderate spin on the problem, there is nothing wrong with ITT, if you explicitly say what the method shows — the effect of assigning subjects to an experimental protocol; the title of my paper was Intention-to-treat.  What is the question? If you are very circumspect about that question, then there is little problem.  It is common, however, for the Abstract of a paper to correctly state that patients “were assigned to a diet” but by the time the Results are presented, the independent variable has become, not “assignment to the diet,” but “the diet” which most people would assume meant what people ate, rather than what they were told to eat. Caveat lector.  My paper was a kind of over-kill and I made several different arguments but the common sense argument gets to the heart of the problem in a practical way.  I’ll describe that argument and also give a couple of real examples.

Common sense argument against intention-to-treat

Consider an experimental comparison of two diets in which there is a simple, discrete outcome, e.g. a threshold amount of weight lost or remission of an identifiable symptom. Patients are randomly assigned to two different diets: diet group A or diet group B and a target of, say, 5 kg weight loss is considered success. As shown in the table above, in group A, half of the subject are able to stay on the diet but, for whatever reason, half are not. The half of the patients in group A who did stay on the diet, however, were all able to lose the target 5 kg.  In group B, on the other hand, everybody is able to stay on the diet but only half are able to lose the required amount of weight. An ITT analysis shows no difference in the two outcomes, while just looking at those people who followed the diet shows 100 % success.  This is one of the characteristics of ITT: it always makes the better diet look worse than it is.

         Diet A         Diet B
Compliance (of 100 patients)   50   100
Success (reached target)   50    50
ITT success   50/100 = 50%   50/100 = 50%
“per protocol” (followed diet) success   50/50 = 100%   50/100 = 50%

Now, you are the doctor.  With such data in hand should you advise a patient: “well, the diets are pretty much the same. It’s largely up to you which you choose,” or, looking at the raw data (both compliance and success), should the recommendation be: “Diet A is much more effective than diet B but people have trouble staying on it. If you can stay on diet A, it will be much better for you so I would encourage you to see if you could find a way to do so.” Which makes more sense? You’re the doctor.

I made several arguments trying to explain that there are two factors, only one of which (whether it works) is clearly due to the diet. The other (whether you follow the diet) is under control of other factors (whether WebMD tells you that one diet or the other will kill you, whether the evening news makes you lose your appetite, etc.)  I even dragged in a geometric argument because Newton had used one in the Principia: “a 2-dimensional outcome space where the length of a vector tells how every subject did…. ITT represents a projection of the vector onto one axis, in other words collapses a two dimensional vector to a one-dimensional vector, thereby losing part of the information.” Pretentious? Moi?

Why you should care.  Case I. Surgery or Medicine?

Does your doctor actually read these academic studies using ITT?  One can only hope not.  Consider the analysis by Newell  of the Coronary Artery Bypass Surgery (CABS) trial.  This paper is astounding for its blanket, tendentious insistence on what is correct without any logical argument.  Newell considers that the method of

 “the CABS research team was impeccable. They refused to do an ‘as treated’ analysis: ‘We have refrained from comparing all patients actually operated on with all not operated on: this does not provide a measure of the value of surgery.”

Translation: results of surgery do not provide a measure of the value of surgery.  So, in the CABS trial, patients were assigned to Medicine or Surgery. The actual method used and the outcomes are shown in the Table below. Intention-to-treat analysis was, as described by Newell, “used, correctly.” Looking at the table, you can see that a 7.8% mortality was found in those assigned to receive medical treatment (29 people out of 373 died), and a 5.3% mortality (21 deaths out of 371) for assignment to surgery.  If you look at the outcomes of each modality as actually used, it turns out that that medical treatment had a 9.5% (33/349) mortality rate compared with 4.1% (17/419) for surgery, an analysis that Newell says “would have wildly exaggerated the apparent value of surgery.”

Survivors and deaths after allocation to surgery or medical treatment
Allocated medicine Allocated surgery
  Received surgery     Received medicine   Received surgery     Received medicine
Survived 2 years   48   296   354   20
Died    2    27    15    6
Total   50   323   369   26

Common sense suggests that appearances are not deceiving. If you were one of the 33-17 = 16 people who were still alive, you would think that it was the potential report of your death that had been exaggerated.  The thing that is under the control of the patient and the physician, and which is not a feature of the particular modality, is getting the surgery implemented. Common sense dictates that a patient is interested in surgery, not the effect of being told that surgery is good.  The patient has a right to expect that if they comply, the physician would avoid conditions where, as stated by Hollis,  “most types of deviations from protocol would continue to occur in routine practice.” The idea that “Intention to treat analysis is … most suitable for pragmatic trials of effectiveness rather than for explanatory investigations of efficacy” assumes that practical considerations are the same everywhere and that any practitioner is locked into the same abilities or lack of abilities as the original experimenter.

What is the take home message.  One general piece of advice that I would give based on this discussion in the medical literature: don’t get sick.

Why you should care.  Case II. The effect of vitamin E supplementation

A clear cut case of how off-the-mark ITT can be is a report on the value of antioxidant supplements. The Abstract of the paper concluded that “there were no overall effects of ascorbic acid, vitamin E, or beta carotene on cardiovascular events among women at high risk for CVD.” The study was based on an ITT analysis but,on the fourth page of the paper, it turns out that removing subjects due to

“noncompliance led to a significant 13% reduction in the combined end point of CVD morbidity and mortality… with a 22% reduction in MI …, a 27% reduction in stroke …. a 23% reduction in the combination of MI, stroke, or CVD death (RR (risk ratio), 0.77; 95% CI, 0.64–0.92 [P = 005]).”

The media universally reported the conclusion from the Abstract, namely that there was no effect of vitamin E. This conclusion is correct if you think that you can measure the effect of vitamin E without taking the pill out of the bottle.  Does this mean that vitamin E is really of value? The data would certainly be accepted as valuable if the statistics were applied to a study of the value of replacing barbecued pork with whole grain cereal. Again, “no effect” was the answer to the question: “what happens if you are told to take vitamin E” but it still seems is reasonable that the effect of a vitamin means the effect of actually taking the vitamin.

The ITT controversy

Advocates of ITT see its principles as established and may dismiss a common sense approach as naïve. The issue is not easily resolved; statistics is not axiomatic: there is no F=ma, there is no zeroth law.  A good statistics book will tell you in the Introduction that what we do in statistics is to try to find a way to quantify our intuitions. If this is not appreciated, and you do not go back to consideration of exactly what the question is that you are asking, it is easy to develop a dogmatic approach and insist on a particular statistic because it has become standard.

As I mentioned above, I had a good deal of trouble getting my original paper published and one  anonymous reviewer said that “the arguments presented by the author may have applied, maybe, ten or fifteen years ago.” This criticism reminded me of Molière’s Doctor in Spite of Himself:

Sganarelle is disguised as a doctor and spouts medical double-talk with phony Latin, Greek and Hebrew to impress the client, Geronte, who is pretty dumb and mostly falls for it but:

Geronte: …there is only one thing that bothers me: the location of the liver and the heart. It seemed to me that you had them in the wrong place: the heart is on the left side but the liver is on the right side.

Sgnarelle: Yes. That used to be true but we have changed all that and medicine uses an entirely new approach.

Geronte: I didn’t know that and I beg your pardon for my ignorance.

 In the end, it is reasonable that scientific knowledge be based on real observations. This has never before been thought to include data that was not actually in the experiment. I doubt that nous avons changé tout cela.

A guide for consumers and the media

For Mark Twain’s hierarchy of lies, damned lies and statistics, we should really add epidemiological lies, those reports showing that brown rice or trans-palmitoleic acid will prevent diabetes and diet soda will make you fat, which appear every week or so in ABCNews.  (I mean the generic media, but ABCNews and I have a close relationship: sometimes they even print what I tell them).  If you’ve been eating white rice instead of brown rice and you develop diabetes ten years later, it is the fault of your choice of rice. Everybody knows that this is ridiculous but the data are there showing an almost 4-fold increased risk, so how can you argue with the numbers.

These kinds of studies are always based on associations and the authors are usually quick to tell you that association doesn’t mean causality even as they interpret the data as a clear guide to action (“Substitution of whole grains, including brown rice, for white rice may lower risk of type 2 diabetes.”)   In fact, to most scientists, association can be a strong argument for causality.  That is not what’s wrong with them. Philosophically speaking, there are only associations.  All we really know is that there is a stream of particles and there is an association between the presence of a magnet and the appearance of a spot on a piece of photographic paper (anybody remember photographic paper?).  God does not whisper in your ear that the particle has a magnetic moment.  It is the strength of the idea behind the association and the presentation of the idea that determines whether the association implies causality.  What most people really mean is that “association does not necessarily imply causality.  You may need more information.” What’s wrong with the rice story is that the idea is lacking in common sense.  The idea that the type of rice you eat has any meaningful impact by itself, or even whether one can guess whether it has a positive or negative impact on a general lifestyle, is absurd.  But what about the statistics? Here the problem is really presentation of the data.  The number of papers in the literature pointing out the errors in interpretation of statistics is very large although it is still less than the number of papers making those errors.  There are numerous problems and many examples but let’s look at the simplest case: limitations of reporting relative risk and alternatives.

images-1Here’s a good example cited in a highly recommended popular statistics books, Gerd Gigerenzer’s “Calculated Risks.” He discusses a real case, the West of Scotland Coronary Prevention Study (WOSCOPS) comparing the statin drug, pravastatin to placebo in people with high cholesterol.  The study was started in 1989 and went on for about 5 years.  (These days, I think you can only compare different statins; everybody is so convinced that they are good that a placebo would be considered unethical):

1. First, the press release: “People with high cholesterol can rapidly reduce… their risk of death by 22 per cent by taking…pravastatin.”

2. Now, ask yourself what this means? If 1000 people with high cholesterol take pravastatin, how many people will be saved from a heart attack that might have otherwise killed them?  Think about this, then look at the data, the data that should have  been reported in the media.

3. The data:

Treatment        deaths during 5 years (per 1000 people with high cholesterol)

pravastatin             32

placebo                  41

Right off, it doesn’t look as good as you might have thought.  Overall, death from a heart attack is a major killer, but if you take a thousand people and watch them for five years, not that many people die from a heart attack. Now there are three standard ways of representing the data.

4. Data presentation – Relative risk reduction.

Risk is the number of cases divided by total number of people in the trial (or risk per total number). So you calculate a risk for 1000 people on the drug = 32/1000 = 03.2 % and similarly for people on the statin. Risk reduction for comparing treatments is\ the difference between the two risks.  The relative risk reduction here  is just the reduction in risk divided by the risk for the placebo:

Risk reduction (number of people saved per thousand)  = 41-32 = 9. Saving 9 lives doesn’t sound that great but lets get the per cent as reported.

Relative risk reduction = 9/41 = 22 % as indicated, and it does sound like a big deal but there are other ways to look at the data.

5.  Data presentation – Absolute risk reduction.  Again, you start with risk, the number of cases divided by total number but you calculate the actual fraction.  The absolute risk reduction is the difference between these two fractions.

For pravastatin, risk = 32/1000

For placebo, risk = 41/1000

Absolute risk reduction = (41/1000) – (32/1000) = 9/1000 = 0.9 % (less than 1 %)

6.  Data presentation – Number needed to treat (NNT): This is a good indicator of outcomes.  If you treat 1000 people, 9 will survive who might have otherwise died. So,

number that you have to treat  to save one life = NNT  =  1000/9 = 111 people .

7. Conclusion: 22 % risk reduction is true enough but it seems like it didn’t really tell you what you want to know.  Cutting to the chase, would you take a statin if you had high cholesterol (more than about 250 mg/dl) and, as in WOSCOPS, no history of heart attacks. On the basis of this study alone, it’s not clear.  First, the risk is low.  There is clearly a benefit but how predictable is that benefit?  In the study, 99 % of the people had no benefit.  Of course, if you are the one out of a hundred, the drug would be a good thing.  The question is not easy to answer but the point of what’s written here is that the statistics as reported in the media might have led you to jump to conclusions.  Before you jump, though, you might ask about side-effects.  This is a complicated subject because although the side-effects are rare, their incidence is not zero and they can be severe but this post is only about the statistics.