Double-Blind Studies

Is the Gold Standard Fool’s Gold?

Subscribe button

Recently, my vet prescribed a new medication for my dog’s severe musculoskeletal pain. It didn’t seem to have any effect over the next week. I asked for tramadol, which made a major difference within a couple days. The research, according to my vet, says the new drug is far more effective than tramadol, that tramadol does not ease musculoskeletal pain in dogs. In telling me this, my vet shrugged; she’s had other patients report the same as I did: no effect from the expensive new drug but relief with tramadol. It makes me wonder about the tensions between anecdotal evidence (e.g., patient reports) and double-blind placebo-controlled clinical trials that are considered the gold standard.

A 2009 Townsend Letter article by Dr. Abram Hoffer and E. Paterson made me aware of the shortfalls of this “gold standard.”1 Hoffer and Paterson explain that controlled trials were originally used in agriculture to investigate the result of adding a nutrient, such as potassium to the soil; one field received the nutrient and the other did not, and crop yields were compared. “This technique was so successful that it was taken into human research. but there was and is one major problem: we are humans, not plants of wheat; and we display extraordinary variability,” they write. “With plant research, one does not have to count each plant as an individual. With mammals and humans, this is essential.”

Hoffer and Paterson also take issue with the arbitrary use of P ≤ 0.05, a cut-off point set by statistician Sir Ronald A. Fisher (1890-1962) in his 1925 book Statistical Methods for Research Workers, used to determine if a result is statistically significant. In a recent TL letter to the editor, Douglas Lobay, ND, explains that this arbitrary value can be compromised—intentionally or unintentionally—by P-hacking.2 That is, researchers can perform multiple statistical tests on subgroups of the data in hope of obtaining the Holy Grail of P ≤ 0.05. Drug makers also conduct multiple clinical trials until they get the desired P ≤ 0.05 result, which they can then use to convince the FDA (and clinicians) that their new product is effective.

Hoffer and Paterson say, “The anecdote has been effectively vilified, even though it is the foundation of all advances in medicine without exception…. Progress in medicine would have been impossible without these anecdotes. Only recently, has this term become pejorative, used as a way of attacking those clinicians who still believe that what they observe and record is important and that not every clinical study has to be double-blind. It is like claiming that the notes compiled by a doctor in his records about a patient are invalid.”

Anecdotal evidence provides real-world observations that can be tested in rigorous clinical trials. For example, anecdotal evidence about the benefits of EDTA chelation for cardiovascular patients led to the TACT study and chelation’s inclusion in the American College of Cardiology guidelines for treatment of coronary artery disease.

But what happens when “gold-standard” evidence does not concur with real-world, anecdotal evidence? Suppose my vet had shrugged and said, “Well, the studies say tramadol has no effect” and insisted that I keep giving my dog the new drug? Should research trials usurp anecdotal reports?

Jule Klotter

  1. Hoffer A, Paterson E. The Emperor Has No Clothes: An Anecdote. Townsend Letter. April 2009;72-77.
  2. Lobay D. P-Hacking, Cherry-picking and Data Dredging. Townsend Letter. January 2020;78-79.