This week’s Green Journal has an important little article reporting poor inter-rater reliability for neurology faculty grading residents’ clinical skills (NEX) exams. They posit a few potential explanations, but this is the one that caught my attention:
. . . A much more straightforward interpretation of our results is that the current implementation of the NEX is inadequate. This could be either because evaluators have been inadequately trained or because of inherent flaws in the instrument . . . It remains possible that the NEX itself is simply not a valid assessment tool . . . Additional investigation is necessary.
Certainly, we want to ensure that graduating neurologists have achieved the requisite level of competence, but is it too much to ask that we validate the instruments we use for this purpose before implementing them? The lack of validation has several consequences: Competent residents may be falsely labeled as incompetent. Incompetent residents may be falsely labeled as competent. The public may lose confidence in our certifications. Setting up and grading the NEX exams uses up the valuable time of our patients, residents, and faculty.
In an example of how these things can spiral outward, there now exists a training program (info sheet here; download PowerPoint slides here) that attempts to standardize the grading of the NEX exams. This would be highly desirable if the NEX is found to be valid and the poor inter-rater reliability stems from a lack of training on the examiners’ parts. But if the NEX is not a valid measure of competence, then no amount of training in its use will change that.
There’s a parallel argument to be made in the clinical quality arena. If, for example, 30-day readmission rates are not a valid quality measure (readmission risk being highly correlated with illness severity and socioeconomic status), then it may not be appropriate to focus a lot of resources on reducing that number per se. Mortality data provide an even better example. In this brilliant study, a mathematical model of high- and low-quality hospitals showed that even with perfect risk adjustment, fewer than 12% of low-quality hospitals would be identified as such, while 62% of hospitals identified as having unacceptably high mortality were actually of high quality (false positives). Or, as per a related editorial:
Most poor-quality providers quietly shelter in the body of the outcomes bell curve, whereas most hospitals with poor outcomes are ‘‘false-positives’’, forced to embark on pointless reviews of their care, at considerable cost in time, effort, morale and reputation.
Most everyone reading this provincial little blog knows that I’m quite passionate about providing high-quality care. My point here is just that we need to validate our quality measures before spending a lot of time and money on meeting them.