A Bayesian Approach to Ruling out Neurological Disease

As most readers of this blog are aware, I’m an amateur student of the psychology of decision-making (in fact, there a tag for this over on the right). Having just finished Nate Silver’s excellent book, The Signal and the Noise, I’ve got Bayes’ Theorem on my mind and have been thinking about how it can be applied in practice.

Composite case: A 72 year old man presented with acute left sided numbness and weakness, last known well 2 hours before. He has all of the usual vascular risk factors, and these are poorly-controlled. The exam, however, is quite confusing. He’s agitated, combative, and disoriented. There is no clear neglect. Visual field testing is inconsistent across numerous attempts, sometimes suggesting a monocular problem on the left, sometimes a left sided field cut, and sometimes no clear abnormality at all. Other cranial nerve functions are normal. He says that he can’t move his left side, but with encouragement he produces 3/5 strength consistently, and can occasionally generate 4/5 strength . He reports left sided numbness. There are no reflex asymmetries or Babinski signs.

Non-contrast head CT shows extensive vascular calcifications, but no acute abnormalities.

Do we give tPA?

It’s common in neurology to obtain a confusing exam. Sometimes this is because we don’t recognize the pattern of deficits. I’m told that multiple sclerosis was frequently misdiagnosed as psychogenic in etiology, since its multifocality cuts against our usual imperative to “localize the [singular] lesion”. Sometimes difficulty arises because the patient overlays psychologically-mediated behaviors on top of organic deficits. And of course sometimes all of the exam findings are psychogenic.

If we interpret the exam above as suggesting a combination of organic and psychogenic deficits, we might withhold a definitive stroke diagnosis and tPA treatment. Can a Bayesian approach help us here?

The first step in a Bayesian approach is to estimate the prior probability of the diagnosis. If the patient were young and healthy, the prior probability would be very low. When the patient is old and has a plethora of poorly-controlled vascular risk factors, however, the prior probability is high. So, in the example above we have an elderly man with extensive vascular risk factors who presents with acute, lateralized neurological symptoms. And he has cerebrovascular calcifications seen on CT. I think it’s fair to estimate the prior probability of stroke at 90%.

Then, we then modify our assessment in light of each new piece of information. Unlike many blood tests, we don’t have sensitivity and specificity data for every component of the clinical evaluation. The Gill paper references some older literature on the clinical evaluation of ascites, abdominal aortic aneurysm, etc. The main conclusion is that clinical examination is far from perfect. “Test” characteristics (sensitivity, specificity, etc.) for various exam maneuvers vary according to the severity of disease and other factors, but can be as low as 28% and as high as 90%.

The figure below illustrates the critical relationships among the prior probability of the disease, the test characteristics of the neurological exam, and the negative predictive value of a “negative” neurological exam (since in this example we’re trying to rule out stroke with our exam).

On the x-axis are the sensitivity and specificity of the neurological exam, which I’ve combined into one number as a simplification. (In reality, there is a trade-off between the sensitivity and specificity of a test, but here I’m more concerned with the basic relationships than the precision of the numbers. Which is another way of saying that I’m not a statistician, but would welcome any corrections to this that are more precise).  If the sensitivity and specificity are 95% it means that if a person has a stroke, we’re 95% likely to find it on exam and if he doesn’t have a stroke, we’re 95% likely to correctly rule it out. If these are 70%, then we’ll only make a clinical diagnosis of stroke 70% of the time that the patient actually has one, and we’ll incorrectly say he didn’t have a stroke 30% of the time.

The y-axis is the negative predictive value of the exam. If this is 90%, it means that when our exam is “negative” for stroke, it is 90% likely that the patient in fact does not have a stroke and 10% likely that he does. If it is 20%, it means that it is 20% likely that he doesn’t have a stroke at 80% likely that he does.

The blue line shows the relationship between sensitivity/specificity and negative predictive value when the prior probability of stroke is 90%. The red line is for a prior probability of 50%. You can see that if our prior probability is high (blue line), the negative predictive value of our exam drops off sharply as the sensitivity and specificity of our exam goes down. When the prior probability is lower (red line), then even a less-than-perfect exam provides accurate results.

So it seems to me that a key question that we must ask ourselves is what are the sensitivity and specificity of our exam maneuvers? We’re always taught that the exam reigns supreme–it’s the sine qua non of neurology, the queen of specialties. I’ll submit, however, that the accuracy of our exams likely depends on the situation. There are cases that are very clear and the exam is likely to be highly reliable–think of a case of suspected myelopathy where the exam shows spastic paralysis of both legs with marked hyperreflexia, bilateral Babinski signs, and a sensory level at the umbilicus. Then there are cases that are not clear, like the example above, where the exam is hard to interpret. I suspect that we often fall into the trap of believing that our beloved neurological exam always has very high sensitivity and specificity when in fact there are situations where the findings are ambiguous and the accuracy is consequently lower.

Getting back to our example, let’s stubbornly insist that we’re 95% sensitive and specific in detecting and ruling out stroke on the basis of our exam. The negative predictive value of our negative exam would be 68%. That means that there’s a one in three chance that we’re wrong and the patient has a stroke after all. That might not be enough to convince us to give tPA, but I think it’s certainly humbling.

However, let’s indeed be more humble and estimate that in this situation, with an agitated, combative patient showing us some focal signs but a lot of inconsistent behaviors, our sensitivity and specificity are only 70%. Now, our negative predictive value is only 21%. Because of the powerful effect of the high prior probability, there’s a 79% chance that we’ve incorrectly excluded stroke as the diagnosis.

So, who wants to give tPA now? It’s very disconcerting to think about administering a potentially dangerous medication to someone whom you’ve just examined and don’t have a clear understanding of where their lesion lies. Nonetheless, if the history and ancillary data available at the time are strongly suggestive (i.e., if the prior probability is very high), it may indeed be appropriate to render a stroke diagnosis anyway and administer the drug. Providing further reassurance is evidence suggesting that it is safe to administer tPA to patients with stroke mimics.

Credit: I made the graph above using this website.
The formula for the blue line is:  y = 100 * [[0.1 * x] / [[0.1 * x] + [[100 – x] * [0.9]]]]
and the red line is: y = 100 * [[0.5 * x] / [[0.5 * x] + [[100 – x] * [0.5]]]]

About Justin A. Sattin

I'm a vascular neurologist and residency program director. I created this blog in order to share some thoughts with my resident and other colleagues, and to foster my own learning as well.
This entry was posted in Patient Care, Practice-Based Learning and Improvement and tagged . Bookmark the permalink.

3 Responses to A Bayesian Approach to Ruling out Neurological Disease

  1. Aaron Struck PGY-3 Neurology University of Wisconsin says:

    Thank you for bringing up this important topic. This kind of probabilistic reasoning is helpful in navigating the inherent uncertainty in clinic medicine. A great example is the HIV test with a 90% sensitivity and 90% specificity. If you use this test on the population that has an incidence of say 0.001 (incidence in Wisconsin) if that test is positive the chance that the person has HIV is what do you guess….. 0.009 about 1%. If the test has 99% sensitivity and specificity it would be still only be 50% chance that you actually had HIV. If you have certain risk factors, MSM, HIV etc this changes dramatically.

    For most of the life of medicine the clinical impression was supreme and the more experienced the clinician the better the clinical impression. I read in an article once that physicians had the latest stage in life to reach their max productivity (usually in their 50s, for contradistinction: 20s for mathematicians, 30s for physicists, teens for swimmers) The reason it takes so long to be a great doctor is the reliance on experience. Experience essentially means honing the sub rosa probabilistic models inherent to the human mind. The more patients that are seen the better the cerebral data-driven model becomes. However that experience is difficult to pass down (take for example the textbooks on my shelf: epilepsy 1600 pages verus a textbook on classical mechanics 400 pages). The medical trainee needs a large experiential component, significantly more then other professions; hence residency I guess. The other problem is the inherent bias in our innate decision models. These are well known to psychologists. Examples being the excessive influence of the last event (or patient) and our natural risk aversion (why do they never go for it on fourth down all the stats say they should).

    I personally cannot foresee a future in medicine where physicians and health care organizations are not beholden to the data driven modeling. The evidence is all ready in Epic and in the accountable health care organization concept. It is a matter of time before the bond between billing/liability and this kind of modeling becomes covalent. In some ways I am sure it will be beneficial, but it will also encroach on the autonomy of the physician. This frontier holds exciting opportunities to improve efficiency and patient care. But there is always the dark side. The use of algorithms blinds the end-user to the underlying sausage-factory of data. No technique can protect against poor quality data whether it is the data used to create the model or when the model is being used, and once something becomes the standard it is hard to change it no matter how erroneous it is (especially if there are not competing models). Despite the shimmering fascade of numbers and cold-hard empiricism, there are definite ways for the usual characters of politics and commerce to exert their influence (nicely noted in the Grand Rounds by Dr. Chappell “why all trials with have a P=0.049”).

    There is always an inherent risk in generalizing data (the procrustean dilema) especially when the underlying science is not understood, the real limit of the empiric nature of clinical trials. (It took Einstein one Solar Eclipse to prove relativity, why can’t we agree on stroke endovascular thrombolysis despite millions of dollars and many trials). Think about this theoretical situation: you have a clinical trial where you give everyone albendazole after first time seizure and your outcome is chance of developing a second seizure. In Latin America you would have clinical efficacy, if you tried to apply that finding in the US you would be doing quite a diservice to our patients with epilepsy. The generalization would still be wrong even if you corrected for all the genetic, age, sex, and demographic factors other than geography.

    I hope these techniques are implemented with caution and humility and not caprice.


  2. dbhatti says:

    I have been going through all of the posts in the blog with great interest over the last week since being introduced to it by a fellow neurology resident. I love most of your posts.

    The idea of prediction and predictability is interesting one. I was wondering if you are familiar with work of Nicole Naseem Taleb like Black swan. which also deals with dilemmas of prediction but in a different way.

    Movement Disorders Fellow, UNMC

    • Justin A. Sattin says:

      Hello, and thanks for joining us!

      I have read Taleb’s book and found it very interesting. I think that type of analysis is very suitable to some common issues in neurology. For example, it’s been asserted to me before that the risk of renal failure from cerebral angiography is negligibly low: “have you ever seen it?” Well, no, I haven’t seen it, but that doesn’t mean that it won’t happen some day and really harm someone.

Comments are closed.