One of the arguments I see used as a way of quickly disregarding published evidence, particularly those with negative results, is the idea of internal validity coming at the expense of external validity — essentially that the trial is “too controlled”, not representative of the patients actually seen in the clinic and does not account for certain variables. This is absolutely true, there are plenty of issues with generalizability from research into practice. However, I am not convinced that this general argument is compelling enough to disregard the results of trials studying particular treatments.
When a trial is designed with the goal of determining a treatment’s efficacy (that is, does the treatment have a specific effect), the inclusion and exclusion criteria are often carefully curated to get “ideal patients” — people that are most likely to benefit from the treatment studied. These are patients that are not too young and not too old, are not terribly sick, are not taking too many medications, are not using other treatments that might mute the studied treatment’s benefits, are fortunate enough to not have any significant co-morbidities, have the means to travel to and from the numerous study appointments and can afford to possibly miss work because of the time requirements of the study. This is also true in effectiveness trials, but perhaps to a lesser extent due to less rigid control that comes with answering the question of “is this treatment effective in the real world?”
For example, if you were to conduct an efficacy trial on therapeutic cabbage rubbing for chronic low back pain your inclusion and exclusion criteria might look something like this:
Subjects were recruited from a private practice clinic in Salt Lake City, UT from January 1801 to February 1802. To be eligible, patients (of either sex) needed to be aged 18 to 60 years and have experienced chronic low back pain for at least 3 months. Patients were excluded if they were pregnant, demonstrated unilateral or bilateral radicular symptoms, had confirmed serious spinal pathology (including fracture, tumor or otherwise), had confirmed nerve root compression, had received treatment for their low back pain in the last six weeks including injection, surgery, or other physical therapy, or have a history of cancer and/or cardiorespiratory disease.
Hopefully you can see that these are the kind of patients that give a treatment the best chance to demonstrate any sort of benefit. So when the results of a trial with “ideal patients” fails to demonstrate any large or meaningful effects, this is particularly telling. It would be foolish to say that “well, this trial that showed no effect for therapeutic cabbage rubbing in chronic low back pain didn’t really study the type of patients I see in my clinic so the results do not apply to my practice” as if it is somehow more likely that your patients who are 65, pregnant with a history of cancer, bilateral radicular symptoms and serious spinal pathology are more likely to respond to the treatment. Though I will concede, if you treat wealthy patients who have an unreasonable amount of love for cabbage, you might be able to make that argument.
There are specific cases where the results might not actually apply to your patients, but there must be some significant reason and compelling rationale to support this position (and even then, this still should be tested). Without this, we could simply disregard almost all published evidence as not being applicable. Research does not necessarily seek to exactly replicate the clinic, but instead answer specific questions and contribute to understanding — with varying levels of apparent clinical applicability. It is later up to the clinician to leverage their expertise in integrating and applying research results in real world situations. Simply stating that results do not apply because they are too controlled and not representative is not indicative of the standard of expertise we should hold ourselves to.
There are numerous examples of treatments demonstrating efficacy that fail to demonstrate the same magnitude of effectiveness. This is because translating and generalizing the benefits of treatments in the “perfect” and controlled conditions of a trial is really hard. One such example is exercise. Exercise demonstrates efficacy for improving the primary and secondary complications of things like falls, cardiovascular disease, pulmonary disease, diabetes and obesity. But in the real world, it has a lot of trouble with effectiveness. This is not because exercise does not “work”, but because implementing and adhering to properly dosed, regular exercise is difficult. People have complex social situations, numerous co-morbidities and medications, individual stressors, variable motivation and other biopsychosocial factors that limit the real world effectiveness of exercise.
I have also seen the criticism that efficacy trials do not account for things like “the atmosphere of the clinic” or “the demeanor of the clinician”, but that is the point. When studying the efficacy of therapeutic cabbage rubbing, you are interested in the specific effects of therapeutic cabbage — not the effects of Brad Pitt rubbing cabbage on someone’s sore back in a relaxing spa with their favorite music playing. That is a different research question. If a treatment can reasonably be said to have no efficacy, any real world effectiveness can therefore be reasonably assumed to have nothing to do with the treatment itself.
To state that trials do not represent the patients you see in the clinic would likely be correct. To argue that because the trials do not perfectly represent your patients as reason to ignore the results while failing to provide any plausible reason as to why your patients are more likely to respond to cabbage rubbing (or any other treatment) than the “ideal patients” often studied would be wholly incorrect. Further, if you think the failure to account for things like the cheery disposition of the clinician is the reason a particular treatment did not hold up in rigorous testing, perhaps consider the alternative — the benefits might actually have nothing to do with the treatment itself and have everything to do with how clinicians respectfully and empathetically care for other human beings. This is powerful information and something we should leverage, we just do not need to continue to use bunk treatments to do so.