Many thanks to Dr. Marcus Chacon for hosting journal club yesterday, and to Dr. Wendy Chacon for tolerating our presence in her house until well after the kids were in bed. Kudos to Dr. Cahill for presenting two of the papers with very short notice and to Dr. Alsherbini for taking on the third one. You both did a good job in somewhat difficult circumstances.
The first thing I wanted to follow up on is the PICO method. This acronym stands for Patient (or Population), Intervention, Comparison, and Outcome. It’s a useful way to formulate clinical questions and also to evaluate clinical research papers. An example pertinent to yesterday’s papers would be:
In patients with acute ischemic stroke presenting within 12 hours of last known well and having NIHSS scores between 5 and 25, does endovascular intervention with the Solitaire device, as compared to standard care, result in better neurological outcomes expressed on the modified Rankin scale?
The details of each of these can and should be carefully explored in any rigorous evaluation of a paper (eg, what’s “standard care”, exactly?), and I suggest that going forward, we use this format in any journal club regarding clinical studies.
Yesterday, we reviewed the three papers referenced in my post of February 10th. I found it difficult to guide us in a PICO-focused way because seemingly, every statement made was quickly rejoined by a vigorous defense of the endovascular perspective. If nothing else, we did succeed in identifying many weaknesses in the papers, but I’d like to use this opportunity to organize some of those criticisms in PICO format.
The first paper discussed was the SYNTHESIS study of IV tPA vs. endovascular treatment. A criticism of the subject Population was that they were not selected for the presence of a large vessel occlusion. To the extent that some subjects did not have such occlusion, exposing them to the risk of endovascular intervention (catheterization and, in this study, the highly unusual regional IA tPA administration) without the possibility of benefit from large vessel recanalization biases the trial in favor of conservative therapy
An additional criticism of the Intervention, beyond the regional IA tPA treatment in some subjects, was that the endovascular subjects did not receive IV tPA even though the study population was comprised exclusively of patients presenting within the 4.5 hour window. In clinical practice, of course, we always give IV tPA to those eligible–the pressing question is whether to follow that with endovascular treatment or not.
A criticism of the Outcome measure was that limiting “favorable” outcomes to modified Rankin scale scores of, in this case, 0-1, prohibits the detection of a significant clinical benefit in someone who might have gone from Rankin 5 to Rankin 3.
The second study was IMS III. This study also did not select for Patients with documented large vessel occlusions. Unlike SYNTHESIS, subjects randomized to endovascular therapy but lacking an occlusion at angiography did not receive any further treatment. Still, having subjects in the intervention group who don’t actually receive the intervention biases the study toward the null. (Think of it this way: What would happen if no subjects had large vessel occlusions? We’d be testing IV tPA vs. IV tPA and there should be no difference in the two arms of the trial).
The main criticism of the Intervention was that because the trial was begun a decade ago, the devices used in the beginning were outdated by the end, and numerous protocol amendments were required to keep pace with the rapidly changing technology.
My criticism of the Intervention was that, for much of the trial, subjects in the endovascular arm only received 0.6 mg/kg of tPA–less than standard of care. In my view, this is ethically OK as long as you’re offering something potentially better as part of the intervention. But since, as per the supplemental data (Figure S1, page 24) 100 of the 434 subjects randomized to endovascular care received either no angiogram at all or no endovascular treatment, my concern was that those subjects had to forgo standard care and didn’t even receive the possibility of further benefit in the intervention arm. (One of the later amendments changed the dose in the IV arm to the usual 0.9 mg/kg).
The final trial was MR RESCUE. The study Population was stratified according to the presence or absence of a “penumbral” pattern on CT or MR perfusion. There was strong criticism of the operational definition of penumbral imaging for being hard to understand, even by those highly experienced in the field. It was also asserted that the maximum allowable core infarct volume for the penumbral arms, 90 cc, was too high. In actuality, the mean predicted core infarct volumes in the two penumbral arms was 36-37 cc and the maximum was 58 cc. I forgot to mention this yesterday, but the investigators did also perform a sensitivity analysis to determine if a different infarct volume cutpoint would have affected their results and concluded that it would not have.
The fireworks in this journal club were occasioned by the ending discussion of how to interpret these studies and whether they should change our practice. My assertion that, flawed as they are (and as all studies are to some extent), these trials “failed to show a benefit of endovascular therapy” was met by one enraged denial, accusation of bias against endovascular therapy, and the rebuttal that they in fact show that “the treatments are equivalent”. In the interest of civility, I didn’t push the matter too hard last night, but it bears further examination. First, here’s a quote from the SYNTHESIS paper:
The study was designed to verify or refute an absolute difference of 15 percentage points between the proportions of patients with a favorable outcome in the two treatment groups.
From IMS III:
We calculated that a sample of 900 patients would provide an effect size of 10 percentage points (the absolute difference between the endovascular-therapy and intravenous t-PA groups in the proportion of participants with a modified Rankin score of ≤2 at 90 days) . . .
From MR RESCUE (hyperlink added);
The primary study hypothesis was that the presence of substantial ischemic penumbral tissue and a small volume of predicted core infarct, as visualized on multimodal CT or MR imaging, would identify patients who were most likely to benefit from mechanical embolectomy . . .
The hypothesis was tested by analyzing whether the pretreatment penumbral pattern had a significant interaction with treatment assignment (embolectomy vs. standard medical care) as a determinant of functional outcome scores across all seven levels of the modified Rankin scale (shift in disability levels).
For secondary analyses, patients with scores of 0 to 2 on the modified Rankin scale were classified as having a good functional outcome.
For SYNTHESIS, the most rigorous way to state the primary result is that the trial failed to show a 15% difference in the proportion of subjects achieving a Rankin score of 0-1. IMS III failed to show a 10% difference in the proportion of subjects achieving a Rankin score of 0-2. MR RESCUE failed to show an interaction between the presence of a penumbral imaging pattern and the effect of embolectomy on the Rankin score. Secondary analyses failed to show significant differences in the proportion of subjects (whether all subjects or only the penumbral subjects) achieving Rankin scores 0-2.
I suppose it is technically correct to state that since there were not significant differences in outcomes, then the treatment effects were similar. Clinically, however, that is absurd. When a new treatment becomes available, one that is expensive and invasive, it should be shown to be superior to cheaper, non-invasive treatments before it is widely adopted. Indeed, in their justification of the effect size they chose to seek in the SYNTHESIS trial, the authors state that it was based in part on “the need for a clinical effect sufficiently large to justify the switching from a well-established and simple procedure to one that is newer, more expensive, and more difficult to perform”. In this sense, all three trials have failed.
The more reasonable question is to what extent should these results alter our practice? After all, we just finished picking the studies apart. But we can also pick apart DEFUSE, DEFUSE 2, DIAS, DEDAS, EPITHET, and other studies supporting the hypothesis that sophisticated imaging selects patients well-suited to IV and endovascular treatments in expanded time windows. When those studies, some randomized but early phase and others observational, comprised the bulk of our evidence base, I think it was reasonable to be cautiously optimistic about the promise of multimodal imaging and endovascular treatments and to treat patients accordingly. For me, that meant disclosing to patients and families the still-uncertain status of such treatment approaches, but generally being quite willing to offer them.
Now that we have three (imperfect) randomized phase III trials, I think that we need to be more cautious. I wouldn’t throw out all of the foundational work noted above and use only these 3 trials to guide us, but these trials must have some effect. They must cause us to be more cautious in holding out multimodal imaging and endovascular treatment as being beneficial to our patients. We must disclose to our patients that notwithstanding our sincere hope that such treatments will ultimately prove beneficial, especially with our talented interventionalists, top-notch imaging, and latest devices, we actually don’t know for sure and indeed have recent evidence (yes, I’ll say it again) that failed to show benefit. To simply reject all three trials as flawed and not alter our practice at all would be irresponsible.
Again, what we really need to do is get involved in one of the ongoing or upcoming trials of these treatments . . .
Correction: 2/15/2013 @ 1803: I edited this post to correct an error in my analysis of IMS III. The 0.6 mg/kg tPA dose was in the intervention arm of the study–not the comparison arm as originally stated.